
Human speech may have a universal transmission rate: 39 bits per second - headalgorithm
https://www.sciencemag.org/news/2019/09/human-speech-may-have-universal-transmission-rate-39-bits-second
======
ilamont
_But, he says, instead of being limited by how quickly we can process
information by listening, we’re likely limited by how quickly we can gather
our thoughts. That’s because, he says, the average person can listen to audio
recordings sped up to about 120%—and still have no problems with
comprehension._

Some years ago I worked on an accessibility project for an app and website
designed for people with disabilities. One of the team members had low vision,
and used a screen reader that must have been set to 3x or even higher. I
usually listen to YouTube and podcasts at 1.5-2x and I could barely understand
the audio. He seemed surprised, which indicated to me that 3x+ was the norm
for people in his circle.

I wonder if his ability was trained through years of using fast screen
readers, vs. a lower visual processing load leads to better audio processing,
or some other explanation.

~~~
knzhou
I've known a lot of people that push podcasts, videos, and audiobooks to
extreme speed. I knew a guy who'd turn video speed up to 8x so he could binge
watch a season of generic anime in an hour flat. I knew a girl who'd get
through paperback romance novels by scanning each page diagonally, in 10
seconds each. And here in this thread we have a lot of people bragging along
the same lines.

I just don't get the point. If you can process content much faster than it was
meant to be played, it doesn't mean you're learning much faster than you
could, it means the novel information density is low. Any content that can be
sped up that much without loss is not worth listening to in the first place.
You're just skipping the trite cliches, filler, and obvious facts.

I can read fast, and I typically go through fluffy NYT bestseller nonfiction
at 600 WPM. But when I do this I constantly have a sneaking suspicion that I'm
just wasting my time. When I read a good book full of new ideas, I barely go
at 150 WPM, but the time always feels well-spent.

~~~
dredmorbius
Exceedingly _slow_ narration, particularly what's normal for audiobooks, is
annoying to me because it's slower than I process words. It's like walking
with someone whose pace is far slower than your natural gait -- it takes more
energy and concentration to slow down. It's why slow-talkers are so annoying.

This isn't "how fast can I go through this" but "what is a comfortable pace"?

So I bump the speed up, though usually fairly modestly: 1.25x - 1.5x is
generally enough.

I've noticed that preferred speeds vary tremendously with the quality of the
work and speaker -- high-density information and an exceedingly good speaker,
and I'll slow down. Slapdash redundant content and poor speaker, I'll speed
up.

The degree of polish in the production matters tremendously. I've listened to
CPG Grey's YouTube videos (highly polished) and podcasts (a lot of chit-chat
with his co-host). The videos work well at normal speed, or perhaps slighly
sped up. The podcasts I find nearly unlistenable, though they improve at much
higher speeds (1.75x - 2x).

~~~
SilasX
Yes, and the painful slowness becomes even more evident if you speed it up to
the 1.25/1.5x range, get comfortable with it, and then go back to 1x. IME, it
sounds like the speaker is going over-the-top to enunciate, like you're a
small child or have learning disabilities or something.

~~~
dredmorbius
Audiences vary, and many listeners of audiobooks are visually disabled who may
_also_ have hearing problems. Pitching the default to these seems a fair deal,
particularly as speeding up is so straightforward.

------
lenepp
Apologies for being harsh, but this kind of thing is the phrenology of our
time. I know it's utterly conventional to think this way about language in
some circles that present themselves as doing legitimate science, but the view
that you can calculate the amount of information in human speech, except in a
super-technical sense that doesn't match any of the reporting on this study or
the way people are interpreting it, has to be called out for the total
nonsense that it is. It doesn't bear a moment's honest reflection.

And yes, I know information theory. It's language that these folks - many of
them prominent and celebrated within their utterly normalized professions,
just like in the days of phrenology - are fundamentally mistaken about. What
quantity of information do you think there is in the word "trump," for
instance? Is it the same over time, to bring up just one feature of how this
funny thing called context informs human speech?

Wittgenstein's Philosophical Investigations is a good place to start if
anyone's interested in understanding this issue.

~~~
bagacrap
They aren't talking about the semantic information of the word "trump". They
explain the methodology for calculating information, and it's per syllable
(based on the number of distinct syllables that are part of the language's
phonetics). So, for English speakers, 'trump' has exactly 7 bits in it. That
exact syllable may or may not exist in another language, but if so the same
singly syllabic word "trump" would have a different number of bits to a
speaker of that language. Maybe next time RTA?

~~~
EpicEng
>Maybe next time RTA?

I think it's you that has missed the point. Syllables have a very loose
correlation to information. So great; we can stream out 39bits worth of
syllables / second. In what way does that describe how information dense those
syllables are? Context matters here.

~~~
rtkwe
I think the fact that context matters so much is why we don't try to quantify
it. The word 'trump' can covey a lot of meaning or next to nothing, eg in a
card game the word trump can covey a lot of information about the state of
play and your reaction to it to your competitors. It doesn't take any longer
to say and in the context of the game may take less time to think up as well.

------
codeulike
Early on when Information Theory was emerging, there were attempts to measure
the bandwidth of consciousness. They reckoned about 18 bits per second or
less, which sounds very low.

Tor Norretranders book, The User Illusion, mentions some of the research:

W R Garner and Harold W Lake "The Amount of Information in Absolute
Judgements" \- Psychological Review 58 (1951) - they attempted to measure
people's ability to distinguish stimuli (such as light and sound) in bits.
Result: 2.2 to 3.2 bits per second.

W E Hick "On the Rate of Gain of Information" \- Quarterly Journal of
Experimental Psychology 4 (1952) - this experiment measured how much
information a person could pass on if they acted as a link in a communication
channel. That is, faced with a series of flashing lights, subjects had to
press the right keys. Result: 5.5 bits per second.

Henry Quastler "Studies of Human Channel Capacity" \- Information Theory,
Proceedings of the Third London Symposium (1956). Measured how many bits of
information are expressed by a pianist while pressing keys on a piano. Result:
25 bits per second.

J R Pierce "Symbols, Signals and Noise" (Harper 1961) - used experiments
involving letters and symbols. Result: 44 bits per second.

Discussion of the research, Tor Norretranders book, and what the research may
have missed here:

[http://memebake.blogspot.com/2008/08/straw-dogs-and-
bandwidt...](http://memebake.blogspot.com/2008/08/straw-dogs-and-bandwidth-
of.html)

------
Mathnerd314
> instead of being limited by how quickly we can process information by
> listening, we’re likely limited by how quickly we can gather our thoughts.
> That’s because, he says, the average person can listen to audio recordings
> sped up to about 120%—and still have no problems with comprehension. “It
> really seems that the bottleneck is in putting the ideas together.”

Glad this paragraph was in the article, clears up their methodology. I wonder
if it applies to writing too, or if skilled writers work faster.

~~~
Aperocky
Really depends on language, if you're writing java, you'll be putting out a
lot more than that due to how stupidly verbose it is.

~~~
jefftk
_> if you're writing java, you'll be putting out a lot more than that due to
how stupidly verbose it is_

Being "verbose" means that each letter you type communicates fewer bits of
information. If the bottleneck is putting ideas together then you would expect
someone writing in a more verbose language to type more letters per minute but
still take a similar amount of time to communicate the idea.

In practice most Java programmers are using IDEs with good auto-completion,
though, so aren't actually needing to type as many letters as you'd think.

~~~
the_af
Like the sibling comment mentioned, it seems verbosity hinders _reading_
comprehension rather than writing. Many IDEs understand this and hide some of
the boilerplate.

This raises the question: if the IDE autocompletes the boilerplate for you,
and also hides it, why is it needed in the first place?

~~~
liability
Consider the verbosity of XML compared to s-expressions. <html>...</html> vs
(html ...)

The latter can trivially be used to output the former. The conclusion is
obvious; some of these formats are objectively more verbose that others while
having equivalent expressive power.

~~~
saagarjha
Interestingly, depending on the context I find one or the other to be more
readable. XML is great when you're got a lot of content, because it provides
additional context of where you are in the closing tag, but it's not as great
when you don't have a lot of data since the closing tag is just visual
clutter.

------
n1231231231234
This is really cool. I am working in a related area and I think most of us
have assumed that on average, the information rate is 'about the same' for the
languages across the world. So it's exciting to see that their results confirm
this assumption.

Two qualifying remarks.

1) The 'about the same' is important. Even in their data, there is still quite
some variance. They found an average of 39bits, with a stdev of 5. That means
that about 1/3 of the data falls outside of the range of 34-44bits.

2) Which brings me to the the uniform information density (UID) hypothesis.
According to the UID, the language signal should be pretty smooth wrt how
information is spread across it. For many years, the UID was thought to be
pretty absolute: Even across a unit like a sentence, it was thought that
information will spread pretty evenly. Now, there is an increasing amount of
research that shows that esp. in spontaneous spoken language, there is a lot
more variance within in the signal, with considerable peaks and troughs spread
across longer sequences.

~~~
godelski
Why did everyone assume it would be the same on average? This seems weird to
me.

Also, can you explain more about how the information density was calculated?
Anything at the bit level seems crazy small to me. Words convey a lot of
information. They cause your brain to create images, sounds, emotions, smells,
etc. I guess we're calling language a compression of that? But even still,
bits seems small.

~~~
n1231231231234
> Why did everyone assume it would be the same on average? This seems weird to
> me.

(see edit below; but i leave this up; it might be interesting, also) you mean
that even for smaller sequences, the UID holds, right? the assumption was that
even for a single sentence, there are a lot of ways to reduce or increase
information density so that you get a smoother signal. e.g.: "It is clear that
we have to help them to move on.", you could contract it to "it's clear we
gotta help them move on" and contract it even further in the actual speech
signal ('help'em'). or you could stretch it: "it is clear to us that we
definitely have to help them in some way to move on", or alike. the assumption
was that such increases / decreases would even be done to 'iron out' the very
local peaks and troughs, particularly in speech.

bits: yeah, that took me a while to get used to, as well. the authors used
(conditional) entropy as a way to measure information density (which is a good
measure in this instance imv). and bits is just per definition the unit that
comes out of information theoretical entropy:
[https://en.wikipedia.org/wiki/Entropy_(information_theory)](https://en.wikipedia.org/wiki/Entropy_\(information_theory\))
. btw: while technically possible, i don't think that the comparison in the
summary article between 39 bits in language and a xy bit modem is a helpful
comparison. bits in the context of entropy are all about occurence and
expectation in a given context. bits of a modem/in CS, they represent a low
level information content for which we do not check context and expectation.

edit: ah, i realise you are asking why most in our community assumed that this
universal rate applied across languages, right?

i guess the intuition was that all of us humans, no matter what language we
speak, use the speech signal to transmit and receive information and that all
of us have the same cognitive abilities. so the rate at which we convey
information should be about the same. sure, there are probably differences
according to some factors (spoken vs written language, differences in
knowledge between speakers, etc.). but when the only factor that differs is
English vs Hausa, esp. in spontaneous spoken language, then the information
rate should be about the same.

~~~
godelski
> esp. in spontaneous spoken language, then the information rate should be
> about the same.

This is entirely non-intuitive to me. I would think with language evolving
that some would be faster than others. If language starts as conveying
extremely simple thoughts then it should take longer to convey certain things.
I would then assume that as the language develops it gets better at conveying
ideas. I would think that thoughts could go much faster than how we process it
with language. Like I have constant thoughts that are really fast and can be
complex. There's no internal dialogue there. But when I think with an internal
dialogue it is much slower.

------
SXX
So here is my own experience. I was avid audio book fan for last 3 years and
while ago some guy on reddit told me about how he listen books on Audible
using high-speed option like 2.x. I never tried that before last summer since
at higher speed speech become incomprehensible for me.

What this guy told me is that it's just take time to adjust to it. So I
basically started to listen for books at slightly higher speed. Then I
gradually increased it and in a few days I could handle 2.0x speed no problem
while listening for really complex fantasy (Malazan Book of the Fallen [1]).
After two weeks I could handle 2.5x without a problem.

In the beginning it was harder to comprehend at high speed while walking or
crossing the street since I lost attention, but in a few months I could do
anything while listening without missing any information or emotions of
narrator.

To give an example of how far this can go. This spring I was listening for The
Expanse audiobook [2] at 4.0x speed. With some effort I could go even faster
for like 5.x in case of these particular books, but obviously can not keep up
for long.

I still usually listen books at 2.0-3.0x depend on narrator and quality of
audio and this skill dont go away even if I have extended time between books
like a month or so.

[1] [https://www.audible.com/pd/Reapers-Gale-
Audiobook/B00M4LRBY6](https://www.audible.com/pd/Reapers-Gale-
Audiobook/B00M4LRBY6)

[2] [https://www.audible.co.uk/pd/Abaddons-Gate-
Audiobook/B00T6NZ...](https://www.audible.co.uk/pd/Abaddons-Gate-
Audiobook/B00T6NZFWK)

UPD: Edit. s/can keep up/can not keep up/

~~~
colechristensen
One thing I'd also like to develop / wish was integrated into audible and the
like is silence trimming. Some speakers leave outsized pauses in their
narration which can be significantly shortened effectively increasing speed
with less distortion.

I have the opposite problem where I have trouble paying attention to an
audiobook at 1x. I get bored in between words and my mind wanders making it
very difficult to keep track of what is being said (as in I hear individual
words but have trouble keeping sentences in memory when everything comes too
slow)

I wish I had realized this in university and had been able to somehow record
and playback lectures at 2x. I always got so little out of lectures because
the information wasn't coming in fast enough for me to process correctly.

~~~
thegranderson
Overcast (a podcasting app) has great features to optimize the high speed
listening experience. They have variable speed, a great silence trimmer, and a
voice boost that makes speech clearer.

~~~
dwighttk
I just wish I could listen to arbitrary audio with Overcast. As it is I have a
blog set up to feed Overcast audio that I give it, but it feels super clunky.

------
tkfu
I'm a bit confused, here. (I went and looked at the original paper.) They
estimated information density for each of the subject languages as a whole, on
average:

> In parallel, from independently available written corpora in these
> languages, we estimated each language’s information density (ID) as the
> syllable conditional entropy to take word-internal syllable-bigram
> dependencies into account.

But the experiment uses the same text translated into each language! Why
introduce this extra variable (and source of error) of estimated language-wide
information density, if you are controlling your experiment such that you have
the exact same information encoded in each language? That is to say, why use
an _estimated_ information density when you could measure it exactly for the
texts that are being spoken? Or, conversely, why go to all the trouble of
having the speakers read the same text translated into each language, if you
aren't going to make use of that symmetry?

~~~
canjobear
Information depends on probability. If something is very probable then it
doesn’t have much information (because you already saw it coming). If
something is improbable then it has a lot of information.

In the paper they want to know how much information is in a syllable in
context. To do that they need to know the probability of each syllable given
the previous syllable. To estimate that probability distribution, you need to
look at a lot of text, much more than just the passages that the authors used
to measure speech rate.

------
tripzilch
Claude Shannon (of the information theory fame) did a similar research with
his 1951 paper "Prediction and Entropy of Printed English". He used a
particularly clever idea, leaving out words or letters from English text, he
then measured how accurately people could predict the missing text. And then
used those numbers for statistical information-theoretical analysis to
estimate the information density at about 9-11 bits (IIRC) per letter.

~~~
mizaru
Looked it up, 11.82 bits per word, 2.62 bits per letter.

~~~
tripzilch
Oh! Yes you're right. My mistake. That makes a lot more sense too, it would be
weird if a letter was more bits than a byte, hm? :-)

------
ShinyObject
The researchers obviously have to keep the scope narrow in order to get
numbers at all.

That said, we should be aware that a tech nerd audience will find simple
answers to complex non-tech questions appealing, and we should not over-
estimate our understanding here just because we have a number.

There is a large amount of data transmitted through sub-communication and
context, particularly during an in-person interaction, which is what people
are wired for. Overall tone, body language, eye contact, and various social
cues make up the bulk of data being transferred in many interactions. There's
a reason why talking to some people feels exhausting and others invigorating,
and it's not just the transcript.

~~~
knzhou
We can avoid reading too much into the study by just remembering the error
bars. It's not like 39 is a universal constant. It's more like 39 with a
standard deviation of 6. That's a wide spread, but it's _less_ wide than the
spread you get from syllable rate alone, and that's all the study
quantitatively tells us.

------
holy_city
I'm not sure what the review process is for the source, but the paper [1] is
pretty interesting. Lots of cool findings/references in there like:

\- There is a measurable difference in information density based on the sex of
the speaker

\- Syllables were chosen as the base unit of measurement because morphemes
(words) are too big/linguistically varied and phonemes (sound equivalent of
letters) are too small and likely to be dropped in regular speech. I'd like to
see the same analysis using phonemes to see how it changes, especially between
dialects.

[1]
[https://advances.sciencemag.org/content/5/9/eaaw2594](https://advances.sciencemag.org/content/5/9/eaaw2594)

------
jefftk
Here's the actual paper:
[https://advances.sciencemag.org/content/5/9/eaaw2594](https://advances.sciencemag.org/content/5/9/eaaw2594)

------
mattr47
In my initial Army job I had to learn how to copy Morse code. I got up to a
respectable 23 groups per minute (a group is 5 characters). If I remember
right, beyond about 13-15 GPM, once the code stopped transmitting I would
still write for another 20 seconds or so. It was not something I consciously
did, it just backed up in my brain.

------
yarg
I doubt that there's a universal rate - a universal mean (within the context
of a shared language) makes more sense.

Every conversation acts as its own handshaking algorithm from which context is
derived, and contexts will vary greatly in terms of amount of language
required to convey concepts.

Jargon rich conversations between experts have the potential to transfer
information at a rate far greater than average.

------
JackFr
So basically human speech processing is a noisy channel which has limited
bandwidth. If the language we encode in has a higher rate of compression, we
need to transmit with fewer errors, but if it has a low rate of compression
(or even redundancy) we can support a high rate of transmission and correct
for more errors.

Which is kind of neat. Thank you Claude Shannon!

------
SketchySeaBeast
>Scientists started with written texts from 17 languages, including English,
Italian, Japanese, and Vietnamese. They calculated the information density of
each language in bits—the same unit that describes how quickly your cellphone,
laptop, or computer modem transmits information.

I feel like the whole "bits" calculation is a neat way to get into the media,
but not actually related to "information density".

Edit: Been informed I'm deeply ignorant on Information Theory.

~~~
furgooswft13
Human speech uses the extremely dense lossy compression method called "A
Lifetime of Experience and Biases"

~~~
burnte
It's only lossy when the decoder's dictionary doesn't have a similar volume of
data points to properly reinflate the data stream. Things that didn't make
sense to me as a teen make far too much sense now because my experience can
properly contextualize the old data.

~~~
furgooswft13
Or lossy when the decoder has a different volume of data points and extracts
totally different information or even info that never existed in the first
place. Kinda like those anime/retro game upscaling filters, except with way
higher variance on what comes out between decoders, or the same decoder at
different times. Gives me an idea for a new JPEG decoder with a floating ADC
as input.

------
fsiefken
The constructed language Ithkuil is, among other things, designed to have a
higher transmission rate. Here are documents and ideas about the upcoming new
2.0 language version:

[http://www.ithkuil.net/new_morpho-
phonology_v_0_8.pdf](http://www.ithkuil.net/new_morpho-phonology_v_0_8.pdf)

[http://www.ithkuil.net/phonotaxis_v_0_4.pdf](http://www.ithkuil.net/phonotaxis_v_0_4.pdf)

[http://www.ithkuil.net/roots_v_0_1.pdf](http://www.ithkuil.net/roots_v_0_1.pdf)

[http://www.ithkuil.net/VxCs_Affixes_v_0_2.pdf](http://www.ithkuil.net/VxCs_Affixes_v_0_2.pdf)

[http://www.ithkuil.net/script_presentation_0_2.pdf](http://www.ithkuil.net/script_presentation_0_2.pdf)

[https://www.reddit.com/r/Ithkuil/comments/bvk1t5/tnil_number...](https://www.reddit.com/r/Ithkuil/comments/bvk1t5/tnil_numbering_system/)

[https://www.reddit.com/r/Ithkuil/comments/29wzfq/interesting...](https://www.reddit.com/r/Ithkuil/comments/29wzfq/interesting_video_touches_on_information_density/)

~~~
kaoD
If your language is complex enough (and it's my understanding that Ithkuil
is[0]) that the emitter takes more time to translate thoughts into it, the
transmission rate is not going to raise regardless of information density
(remember rate is over time).

I guess if we consider non-real time communication, but in that case (e.g. in
English, which is limited by the medium's rate) the _reception_ rate is the
main factor, which is probably not too far off the transmission rate.

I'd say Ithkuil is designed for information density and my guess is its actual
max rate is pretty similar to the submission's 39bps.

[0] IIRC not even its creator is a fluent speaker.

------
6gvONxR4sf7o
From the paper [0]:

>Abstract

>Language is universal, but it has few indisputably universal characteristics,
with cross-linguistic variation being the norm. For example, languages differ
greatly in the number of syllables they allow, resulting in large variation in
the Shannon information per syllable. Nevertheless, all natural languages
allow their speakers to efficiently encode and transmit information. We show
here, using quantitative methods on a large cross-linguistic corpus of 17
languages, that the coupling between language-level (information per syllable)
and speaker-level (speech rate) properties results in languages encoding
similar information rates (~39 bits/s) despite wide differences in each
property individually: Languages are more similar in information rates than in
Shannon information or speech rate. These findings highlight the intimate
feedback loops between languages’ structural properties and their speakers’
neurocognition and biology under communicative pressures. Thus, language is
the product of a multiscale communicative niche construction process at the
intersection of biology, environment, and culture.

[0]
[https://advances.sciencemag.org/content/5/9/eaaw2594](https://advances.sciencemag.org/content/5/9/eaaw2594)

------
jgalt212
The Micro Machines commercial guy was transmitting at at least 900 baud.

[https://www.youtube.com/watch?v=yDBvEo1s6A4](https://www.youtube.com/watch?v=yDBvEo1s6A4)

~~~
rzzzt
John Moschitta held the title of World's Fastest Talker at one point:
[http://nymag.com/speed/2016/12/is-the-micro-machines-guy-
sti...](http://nymag.com/speed/2016/12/is-the-micro-machines-guy-still-the-
fastest-talking-man-on-the-planet.html)

------
Wistar
I wonder how much face-to-face communication adds to the effective
transmission rate where the physical gestures, body language, winks, and other
non-verbal communication are taken into account.

------
stackingup
As an interesting side note, ham radio operators heavily use a digital mode
called PSK31. It stands for "Phase Shift Keying, 31 Baud".

As I understand it, the 31 bit/s transmission rate was chosen because it is
close to the entropy that operators can generate by typing on their keyboards.
PSK31 does not transmit 8 bit bytes, but instead uses what they call a
Varicode, a kind of Fibonacci code. More frequent characters are encoded using
fewer bits, thus the encoded bit rate is an approximation of the entropy in
the text stream.

------
mchinen
Seems reasonable, from what is known about the capacity of speech, although
extra cool if there is really some universal number we converge on. Kuyk and
Kleijn found the upper bound of the speech channel to be 100 bits per second,
in agreement with the lexical rate of about half that in "ON THE INFORMATION
RATE OF SPEECH COMMUNICATION"
([https://ieeexplore.ieee.org/document/7953233](https://ieeexplore.ieee.org/document/7953233))
using an interesting method based on mutual information.

Some highlights from the introduction that are relevant:

 _Broadly speaking, two approaches to measuring the information rate of speech
exist: the linguistic approach, and the acoustic approach. The linguistic
approach describes speech as a sequence of discrete perceptual units such as
phonemes, words, or sentences. Taking the average talking speed as 12 phonemes
per second [3], and using the English phoneme probabilities tabulated in [4],
the lexical information rate is approximately 50 b /s. When the dependencies
of the phonemes are accounted for the rate will be decreased further. The
lexical information rate does not include information about talker
identification, emotional state, and prosody. However, these variables vary
relatively slowly in time and contribute little to the overall information
rate. As an example, [5] estimated that the total amount of talker-specific
information (e.g., age, accent, sex) was of the order of 30 bits_

------
nickjj
I think it also really comes down to what videos or audio you listen to. In
other words, the person delivering the audio and its quality.

I listen to most non-entertainment videos / podcasts at 1.5x and I would say
about 80% of them are completely ok to comprehend from a "can I easily listen
to this without struggling to figure out what they are saying at a language
level". But as soon as I try 2x, that drops down to maybe 30-50% because the
person speaking isn't speaking clear enough, they have an accent that
overpowers any type of comprehension ability on my part or the audio quality
is too poor and it introduces too many artifacts.

Sometimes I ask my friends to watch the same video at 2x to see if they can
comprehend it and often times they can't (but sometimes they can). We're all
in the same area.

I generally find a neutral accent and very clear annunciation helps the most.
I've had a bunch of people say they've watched my videos / courses at 2x
without issues because apparently I have no accent which is something I've
heard from a number of people in different countries where English isn't their
native language. I find it interesting because I've also heard a decent amount
of people say I speak very fast at 1.0x speed, so I do believe accents and
annunciation has at least some role in this.

Does anyone know of any software where you can feed it an English audio sample
and it spits back the number of syllables per second? Seems like a pretty cool
potential ML project.

~~~
ckosidows
Where are you from if people think you don't have an accent, if you don't mind
me asking?

~~~
nickjj
Long Island, New York.

I'm not trying to plug my channel but here's my latest public video from the
other week:
[https://www.youtube.com/watch?v=Kq_khHWovl4](https://www.youtube.com/watch?v=Kq_khHWovl4)

I do believe audio quality plays a -huge- role in this.

For comparison, here's the most recent Railscasts video from a few years ago
by another screencast author:
[https://www.youtube.com/watch?v=urPi4qZJeOE](https://www.youtube.com/watch?v=urPi4qZJeOE)

I can deal with him at 2x but it's mentally taxing because his audio has a
metallic wispy sound at that speed and it makes his words sound blended
together. I think he also talks slightly slower than me at 1.0x as well, so
it's not just base talking speed. Does anyone else notice that metallic sound
too?

Here's another sample of Joe Rogan and Bill Burr on a podcast:
[https://www.youtube.com/watch?v=cS1KWv0das8](https://www.youtube.com/watch?v=cS1KWv0das8)

Listening to them talk at 2x feels like a joy. They are talking a little
slower since it's a casual conversation but the audio is crystal clear and
both of them have very good word annunciation (not surprising since they are
on stage talking for a living).

~~~
bscphil
I agree, you have a very neutral American accent, not much like a New Yorker
at all. Other things I noticed include that you are indeed pronouncing each
word very clearly, but also that you're speaking a bit more slowly than I
would consider a typical speech rate (though not necessarily slower than the
average Youtube video). In this video, you're also speaking at a bit higher
pitch than a typical male voice, which I think further aids clarity because
"sharper" syllables come out very clearly instead of being muddled like they
are in very deep voices.

That said, I couldn't comprehend you well at all at 2x speed (using the
Youtube controls). This might have just been to distortion caused by the
Youtube player on my computer, I'm not sure. At 1.75x you were still very
clear, though I suspect at that rate I would find myself pausing the video now
and then to think about what you were saying.

~~~
nickjj
Thanks for the listen. All of my tests are always using Youtube's playback
controls btw.

Were you able to listen to Joe Rogan's podcast at 2x? Skip somewhere in the
middle and listen for 15 seconds maybe.

You are right in that I speak slower in that video. Most of my more recent
Youtube videos are unscripted so I'm just thinking about things with zero
preparation, where as I script my courses word for word (which leads to faster
speaking generally) but I don't have any course videos with the same audio
equipment to compare side by side.

I didn't read the paper but I wonder what they classify comprehension as.
Personally I wouldn't listen to hardcore technical things at 2x because
understanding the words isn't usually the goal of listening to it. It would be
to fully absorb and understand what you're listening to so you can apply it on
your own later. There's a big difference between a mechanical understanding of
the words and really "getting" what you're listening to.

I typically reserve 2x for listening to tech talks where my goal is to get a
high level overview of something quickly.

~~~
bscphil
> Were you able to listen to Joe Rogan's podcast at 2x? Skip somewhere in the
> middle and listen for 15 seconds maybe.

Hmm, I skipped around to a few different spots. About half the time it was
intelligible at 2x, but as soon as they started speaking faster it became a
garble. Occasionally they would speak fast enough that I couldn't catch it
even at 1.75x. So I'd say they have a lot more variability in their pacing
than you do.

------
Merrill
Typing at 60 words/minute, 5 characters/word and 8 bits/character gives a
gross bit rate of 40 bits/second.

Of course, the information rate is a lot less, since there are fewer than 8
bits of information per character in English. The paper says "from 4.8 bits
per syllable for Basque to 8.0 bits per syllable for Vietnamese" and there are
multiple characters per syllable. So the typing information bit rate is
probably somewhere around 10 to 15 bits/second.

~~~
aidenn0
IIRC 1.3 bits per character is the value used as a rule-of-thumb for english
text entropy, which would yield 6.5bps at 60WPM and an average stenographer
can go at 4x that rate for approximately 26bps.

------
bscphil
I took a peak at the paper and discovered the following: among the languages
they studied, English was second only to French in the number of bits of
information per second conveyed on average. (English also had significantly
fewer syllables per second than French.)

Thus, if you're choosing a language to communicate in on the basis of how fast
it is to get an idea across, English and French are likely your best choices!
(Among languages in the survey.)

------
bsza
> Each participant read aloud 15 identical passages that had been translated
> into their mother tongue

I doubt there is an objective way to ensure that no information is lost or
smuggled in when you translate a text into another language. For example,
English has more than 100 words for 'walk', whereas Toki Pona (a constructed
language known for its extreme simplicity) has only one. But does 'stroll'
encode the same amount of information as 'tawa'? Depends on what you want to
use that information for, I guess. If you only want to know where I went this
morning, they are equally good. If you want to know that my act of going there
was a recreational activity, possibly part of my morning routine, they are
not.

~~~
TacticalTable
It's my interpretation that by deciding the bits of each word, they included
subtext/classification that would differentiate 'walk' and 'stroll'.

------
akjetma
_Scientists started with written texts from 17 languages, including English,
Italian, Japanese, and Vietnamese. They calculated the information density of
each language in bits—the same unit that describes how quickly your cellphone,
laptop, or computer modem transmits information. They found that Japanese,
which has only 643 syllables, had an information density of about 5 bits per
syllable, whereas English, with its 6949 syllables, had a density of just over
7 bits per syllable. Vietnamese, with its complex system of six tones (each of
which can further differentiate a syllable), topped the charts at 8 bits per
syllable._

how can you encode 643 syllables using 5 bits? same for 6949 syllabes/7 bits?

~~~
geomark
If I understand this correctly, it isn't that they are uniquely encoding each
syllable. It's that they are encoding the information in each syllable. Many
syllables have very low information content and must be combined with other
syllables to convey information. Many other syllables are redundant.

------
ta1234567890
> After noting how long the speakers took to get through their readings, the
> researchers calculated an average speech rate per language, measured in
> syllables/second.

> No matter how fast or slow, how simple or complex, each language gravitated
> toward an average rate of 39.15 bits per second

So does this mean that we are "understanding" only those 39 bits of syllables
per second, or more like we are using those 39 bits to index something like an
internal address space?

And if the latter is the case, how big would that address space be?

It would also be cool to see this complemented with the data rate
(bits/second) of emotion communicated per second and see if that increases the
total effective rate of communication between people.

~~~
skosch
The article explicitly concludes that the speaker is the bottleneck, not the
listener. You can understand sped-up Youtube videos just fine.

~~~
ta1234567890
Yes. And they say that we can listen at a rate of up to 120% more, which means
46.8 bits/second of syllables.

So then, does it mean we "understand" 46.8 bits of information, or that we are
using those bit to address some other, maybe bigger/more complex or detailed,
memory space?

------
jumelles
Being able to convey emotion with tone in speech likely adds a lot more
"information".

~~~
_nhynes
The increased information density in tonal language appears to be offset by a
reduced speaking rate [0].

[0] [http://ohll.ish-
lyon.cnrs.fr/fulltext/pellegrino/Pellegrino_...](http://ohll.ish-
lyon.cnrs.fr/fulltext/pellegrino/Pellegrino_to%20appear_Language.pdf)

~~~
pmontra
I think jumelles was referring to something like "don't do that" uttered as
fun, angry, etc.

------
impostir
I would be interested if Rapid Serial Visual Presentation breaks this rule for
any language. My guess is that it wouldn't, but I would be fascinated to find
out. I would guess that languages with more general density of information per
word would force the reader to slow their words per minute rate. Anecdotally,
I can read with RSVP at around 620 words per minute in English and retain
general comprehension. Sadly, I don't know another language well enough to
compare that information with anything.

[https://en.m.wikipedia.org/wiki/Rapid_serial_visual_presenta...](https://en.m.wikipedia.org/wiki/Rapid_serial_visual_presentation)

------
bane
I recall a moment when, as an adult, I thought it might be fun to watch some
cartoons that were my favorite as a child. I actually ended up giving up
fairly quickly because the rate of the speech in many cartoons is apparently
geared to be very slow. I remember firing up the first episode of the classic
Thundercats show and being really struck at how much slower the characters
spoke than my memory of the show.

On the flip side, I found noted radio show host Diane Rehm to be virtually
unlistenable her rate of speech is sooooo incredibly slow. Her guests sound
like they are all at 2x speed compared to her.

I'm pretty sure that this number changes as we age and our processing
faculties gear up and then down.

~~~
stormbeard
Diane Rehm has spasmodic dysphonia, which affects the way she speaks. She's
not just old.

~~~
bane
True, but spasmodic dysphonia doesn't cause slow speech [1]. It's the source
of other speech issues she's worked hard to over come.

Her pattern of speech is in general among the slowest I've ever heard --
sometimes approaching single digit words per minute. She's not always so slow,
there's interstitials and other moments when her speaking is just kind of slow
not unbearable.

Despite my personal feelings, I think her pace of speaking is part of what her
appeal was. After being assaulted by other media all day, her show can also be
a very relaxing listen and was nearly always a very intelligent conversation.

1 -
[https://www.youtube.com/watch?v=SqzfsKMaLqk](https://www.youtube.com/watch?v=SqzfsKMaLqk)

------
ineedasername
I didn't know the but rate, but I learned this studying in computational
linguistics. Languages that seem "fast" tend to have more phonemes per
semantic unit than "slow" languages. In general, languages with a lower
lexicon of phonemes require more of them to make a "word", they get spoken
faster. So the semantic units per period of time remains fairly constant
across languages.

------
fargle
Mm.Hmm... Not sure the researchers have yet encountered the Deep South (US). I
estimate approximately 17 bits of information in "Mm.hmm..." based on the 128K
possible meanings. It takes about 5-10 seconds for proper gestation, when
uttered expertly during the course of a lively conversation.

So more like approximately 1.7 to 3.4 bits/s.

~~~
fargle
y'all

------
alexandercrohde
I don't get the methodology. If you have a short-story, and translate it into
10 languages (and translate very well), it should have the same amount of
information in it in each translation, no?

Therefore the transmission rate will simply be proportional to the time to
read the story. This idea contradicts what their study found, no?

------
drdeadringer
This has at-least tangential interest to me as one of my side projects is to
create a custom "Number Station". Maybe out of a Raspberry Pi or similar, I'm
still very much in the preliminaries. Since I'm in the US I am aware of the
FCC.

------
beefman
See also this story from 8 years ago, which was about a previous study by the
same group

[https://news.ycombinator.com/item?id=2976044](https://news.ycombinator.com/item?id=2976044)

------
iconjack
I once saw research that said English speakers utter about 4 words per second.
Shannon once empirically determined each English word carried about 6 bits of
information. For what it's worth.

------
joeyrosztoczy1
I feel like there's an important follow up study in how much humans can
compress data with respect to those bits (cat -> large -> mane ~= lion).

------
JimBrimble35
I would love to see the impact of acronyms on information density. It would
also be interesting to see how many bits per second the average human maximum
is.

~~~
dredmorbius
YMMV, TLAs increase s/n but leverage ROI on prefamiliarisation.

Language is symbolic, all words are pointers. Whether you collapse complexity
through an Apollonian use of religious icons or through initialisations and
acronyms likely matters little.

But: TANSTAAFL.

------
auiya
Are there weights applied for regional disparities? For instance sped up for
American New Englanders and slowed down for American South Easterners?

------
poormystic
I wish I could provide references... I recall a tutor stating that human
speech can be encoded at a very similar rate. This was in 1985 I think.

------
akjetma
i wonder if there's a constant average reading speed (in terms of bits per
second) across written languages. chinese is obviously more dense per symbol
than english, but is each character processed more slowly as a result,
preserving constant data rate across languages? also interesting to consider
how much harder real time, verbal translation would be if data rates differed.

------
cellular
YouTube at 2x with captions can speed this to less than 78 seconds. "Less than
" to account for advertising.

------
windlessstorm
But we convey a lot in a lot higher bandwidth, because we use parallel
channels each with low bitrate limitations.

------
sasaf5
Interesting, I did a similar analysis for music scores before and arrived at
30 bits per second.

------
hsnewman
So what the title says is: It may or may not have that rate. Moving on,
nothing to see here.

------
Abishek_Muthian
I bet, elon musk is going use this data during his next talk about neuralink
on human brain is limited to 39bit/sec during output of speech and BCI would
surpass this limitation by output to smartphone speakers directly.

------
metrxqin
So speech is a very ineffective way to communicate?

------
anotheryou
the "text" portion? or already with intonation etc?

------
soniman
39.15/64 = phi

~~~
rocqua
and 64 seconds is relevant here because?

~~~
soniman
It's 64 bits. The rate of information is measured in bits, and so 64 was the
closest binary (2^6) denominator to 39.15 bits/sec.

------
humble_engineer
Curiously, this is also why memes are much better at conveying complicated
political or sociological ideas faster and more efficiently than a few
paragraphs on the topic.

