Hacker News new | past | comments | ask | show | jobs | submit login

I'm the blind dev who refactored a huge chunk of the Rust compiler [0]. I'm at roughly 800 words a minute with a synth, with the proven ability to top out at 1219. 800 or so is the norm among programmers. In order to get it we normally end up using older synths which sound way less natural because modern synthesis techniques can't go that fast. There's a trade-off between natural sounding and 500+ words a minute, and the market now strongly prefers the former because hardware can now support i.e. concatenative synthesis.

1219 is a record as far as I know. We measured it explicitly by getting the screen reader to read a passage and dividing. I spent months working up from 800 to do it and lost the skill once I stopped (there was a marked level of decreased comprehension post 1000, but I was able to program there; still, in the end, not worth it). When I try to described the required mental state it comes out very much like I'm on drugs. Most of us who reach 800 or so stay there, though not always that fast for i.e. pleasure reading (I do novels at about 400). it's built up slowly over time, either more or less explicitly. I did it because I was in high school doing muds and got tired of not being able to keep up; it took about 6-8 months of committing to turn the synth faster once a week no matter what, keeping it there and dealing with a day or two of mild headaches. Note that for most blind people these days, total synthesis time per day is around 10+ hours; this stuff replaces the pencil, the novel, etc. Others just seem to naturally do it. You have little choice, it's effectively a 1 dimensional interface, so from time to time you find a reason to bump the knob. And that's enough.

Whether and how much the skill transfers to normal human speech, or even between synths, is person-specific. I can't do Youtube at much beyond 2x. Others can. It's definitely a learned skill.

0: https://ahicks.io/posts/April%202017/rust-struct-field-reord...

And as a followup to that--because really this is the weird part--some circles of blind people (including mine) talk faster between ourselves. That's not common, but it happens. I still sometimes have to remember that other people can't digest technical content at the rate I say it and remember to slow down. A good way to bring it out is to have me try to explain a technical concept that I understand really well. I have the common problem in that situation of not being able to talk as fast as I think, but I also seem to have the ability to assemble words faster in a sort of tokenize/send to vocal cords sense once I know what I want to say.

To me, the fact that this does in fact seem to be bidirectional at least some is more interesting than that I can listen fast.

I kind of want to hear a "blind Danish" conversation now. For those unaware: Danish is phonetically one of the most difficult languages in the world, to the point where children who acquire Danish as their first language on average start speaking a few months later than children with any other first language. To clarify: speaking is not the same as understanding - toddlers are often capable of understanding language before they have the motor control required to speak it, which is why baby sign language exists.

Actually, understanding seems to be affected, too! See:


Oh wow, did not know that! Then perhaps a better way to phrase it is that it does not imply slower general mental development of the children - at least that is what I read elsewhere.

Are you Danish? (your account name looks rather Dutch)

I always wanted to know whether it's true what I heard some years ago: a friend told me that older Danish people are complaining that the younger Danish are harder to understand because there has been a trend in recent decades to swallow consonants even more than in the past

(that's something hard to image for me as a non-Danish speaker :)

No, I'm Dutch like you guessed. While the Netherlands and Denmark are very similar in some aspects (flat countries obsessed with having the best bike infrastructure in the world) the language is quite different ;)

Living in Malmö I also find this quite hard to believe! Besides, isn't old people complaining about the younger generation butchering their language something that happens all the time in all cultures?

This is vague, because it was long ago, but I remember reading an article in a newspaper (Dagens Nyheter, Sweden, IIRC) in... Oh, the 1990s, I think.

It was about how Danish linguists were worried about how the language was getting more unintelligible. The example I remember was about the ever-increasing number of words that are pronounced, basically, “lää'e”: läkare, läge, läger, lager... Those are written in Swedish here; some Dane may translate. (In English: healer, situation, camp [laager], layer...) There were more that I don't remember; in all, I think they mentioned at least half a dozen words that had become basically indistinguishable from each other.

I don't think serious linguists have had the same fear about most other languages.

It is common for any "in-group" to speak faster than the same people in other groups. This is commonly studied in linguistic 101 courses by undergrads because it is so easy to observe - in your own groups - once you are looking for it.

But when the in-group is defined as "is blind"?? I'm not just talking about programmers in this context, or any other cross-section wherein there's some sort of shared vocabulary and context other than the disability itself. I don't think it's been studied, but I've noticed, my parents have noticed, in general enough people around me have noticed it over the years that I'm convinced the effect is real. Is whatever mechanism you're referring to typically general enough that the group can be defined this broadly and still have it happen?

That's awesome. It reminds me of the Bynars in Star Trek, who evolved ultra-fast spoken language: https://www.youtube.com/watch?v=52_iSQnB6W0

I don’t think this is a blind person thing. Ask any nerd about something they’re into, and you stand a pretty good chance of receiving a firehouse of words representing their steam of consciousness.

Has anyone tried overlapping words instead of speeding them up? Like so:

I often wondered if this, or at least sped up speech, should be the default robotic interface... it would make sense to optimize for efficiency/speed (while maintaining legibility) if we can do so.

Wow, that's incredible. Do you find it frustrating talking to actual humans now? I'd imagine it feels like they're speaking in slow motion.

Edit: Hah, just saw your post on talking faster to other people who have the same audio skills.

Other people in realtime aren't...I guess the best way I have to put it is informationally sparse. There's a lot going on beside what's being said in conversation. Synths don't imply things for example; in a context with active implication slowing/pausing the synth is sometimes necessary. The skill doesn't extend beyond the blatant transfer of information. In social contexts and especially when you can't go off any visual cues whatsoever to figure out what the other person is thinking/feeling, there's a lot more going on than simple information transfer.

However most blind people I know who do this start hating audiobooks, start hating talks, and generally by far prefer the text option. audiobooks aren't annoying, but they're below my baud if you will. Net result: boredom/falling asleep to whatever it is and the need to actively make an effort to listen. Some things which require active listener participation--math lectures for example--are different. I guess the best way I can put it is that speed is inversely proportional to the amount of, I guess let's call it active listening, required.

I've given a lot of thought to this stuff, but we don't really have the right words for me to communicate it properly. A neuroscientist or linguist might, but I'm not either of those.

This is fascinating; thank you. How does an audio book read by the author, such as Anthony Bourdain's Kitchen Confidential, which contains autobiographical information, compare? Is it more like a social situation that you need real time to absorb, or do you prefer it at a higher speed? How does a stage play compare? Do you watch movies sped up?

Also, how does your "baud" vary with your familiarity with the ideas? I can't imagine it's independent. As a ~35 year old programmer with a decade of professional experience and a decade of hobby experience before that, I cracked open SICP for the first time and found almost everything familiar. I had digested the ideas from other sources, so I could read at a "natural" rate. If I had read it as a teenager, it would have been a mindfuck, and I would have taken multiple slow readings to understand. When you talk about numbers like 800, are you talking about writing that challenges you and changes the way you think, or are you talking about stuff you do for a living that is just information you're already primed to accommodate?

I haven't specifically tried different types of audiobooks to see if there's some preferred category.

With movies I don't bother with them unless they have descriptive audio, at which point you've got music, sound effects, and two somewhat parallel speech streams going on. That's high informational content.

I did an entire CS degree at 800 words a minute. I program in any programming language you care to name (including the initial learning) at that speed as well. For more complicated concepts I stay at that speed, but pause after every paragraph or so to chunk the content as needed. I'm doing this thread at that speed. Pretty much the only time I slow it down is pleasure reading or sometimes articles when i want to go off and do chores while I listen, but even then it's still faster than human speech.

In general i think answering these sorts of questions needs research that we don't have to my knowledge. Nothing in my personal experience or background really allows me to give you good definitive answers. The sample size to work with is pretty small and in all honesty there's not a lot of good research around blindness day-to-day in the first place.

> audiobooks aren't annoying, but they're below my baud if you will. Net result: boredom/falling asleep to whatever it is and the need to actively make an effort to listen. Some things which require active listener participation--math lectures for example--are different.

That sounds like a similar description to what it's like for people with IQs significantly higher than the average.

I can't find any recordings of 800 WPM synths. Would it be possible for you to make one? I'm curious of what it sounds like.

I don't think it'd be legal for me to hand you an espeak recording, but it works fine at 800WPM.

    espeak -s 800 "Things to say."

You can pass Espeak recordings around legally. It's just GPL. The license applies to the software, not the content produced via it.

I will attempt to remember and find the time to take my demo recording of this on Rust compiler source code that's currently in dropbox and put it up somewhere more permanent. I doubt Dropbox will care for me much if I allow HN-volume traffic to hit my account. It's Espeak using an NVDA fork with an additional voice that some of us like, so vanilla espeak is in the ballpark.

What I don't remember is if vanilla non-libsonic espeak softcaps the speech rate. It might. I believe new versions of espeak integrate libsonic directly, but that old versions just silently bump the speaking rate down if it's over the max. I haven't used command line espeak directly for anything in a very long time.

Libsonic is an optimized library specifically for the use case of screen readers that need to push synths further: https://github.com/waywardgeek/sonic

Here's an online version, I bet it sounds similar as the original program: https://eeejay.github.io/espeak/emscripten/espeak.html

There is a range slider that maxes as 450, which is the maximum speed according to the manual.

I tried listening to a Wikipedia article at 450, I am so amazed you can comprehend that. Perhaps that's equivalent of me visually scanning the text instead of reading, however when I do that, I tend to focus on interesting parts for long stretches of time. With espeak, how do you focus? Can you pause it at will?

Screen readers have a lot of commands for reading different sized chunks of content. In general there's probably around 50 keystrokes I use on a daily basis. It's not as straightforward as reading from top to bottom, though it can be. I can usually do a Wikipedia article without pausing at 450 or so.

If anyone is curious, here is the NVDA keystroke reference: https://www.nvaccess.org/files/nvdaTracAttachments/455/keyco...

As an interesting sidenote, screen readers have to co-opt capslock as a modifier key, then there's fun with finding keyboards that are happy to let you hold capslock+ctrl+shift+whatever.

> Whether and how much the skill transfers to normal human speech, or even between synths, is person-specific. I can't do Youtube at much beyond 2x. Others can. It's definitely a learned skill.

I find that the maximum understandable rate varies a lot between speakers. For some speakers 2.5x is possible, but just 1.5x for others.

One advantage synths has, is that they can more easily control the speed at which words are spoken, and the pauses between words independently. When watching/listening pre-recorded content I often find that I'd want to speed up the pauses more than the words (because speeding up everything until the pauses are sufficiently short make the words intelligible).

If someone knows of a program or algorithm that can play back audio/video using different rates for speech and silence, please share.

Are old speech synths not harsh on the ears to listen for longer periods? Or maybe I'm just familiar with the super robotic ones (I like them for music production).

If so, have you considered using an EQ plugin to maybe turn down the harsher high frequencies a few notches? Just a thought.

They're harsh. But you get used to it in about a week. Espeak is an atypically bad example, which is why NVDA experimented with a fork (and maybe one day the NVDA work will make it upstream). part of what allows them to stay intelligible is the harshness. I've never tried passing one through an EQ but there are already pitch settings and similar to play with, and given that even not wearing headphones slows me down I expect that an EQ would probably be bad for it.

But more to the point there is nowhere to really plug that in to a screen reader, so we can't try it anyway. The audio subsystems of most screen readers are much less advanced than you'd think.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact