Is this somehow using the waybackmachine to scrape pages? When I try some of the classic ssrf paths like file:/// or http://localhost:8080, I get audio about the internet archive.
Not sure about OP but I just implemented this in my Hacker News android client (thanks for the idea OP).
This is how I implemented it. I had already achieved article to "reader mode" extraction by heavily customizing the Kotlin port of Mozilla‘s Readability:
From the extracted "reader mode" text, I had to do further extraction to get rid of things like image captions, author names, article publish timestamps etc. Also links had to be removed to not let the TTS engine speaking out links.
Then I pass the text via Android's TextToSpeech library and it works very well:
fun trySpeaking(str:String){
fun speak(){
tts?.speak(str, TextToSpeech.QUEUE_FLUSH, null, null)
}
tts?.also {
it.stop()
speak()
} ?: run {
tts = TextToSpeech(appContext) { status ->
println("Speak: $status")
if (status == TextToSpeech.SUCCESS) {
speak()
}
}.apply {
language = Locale.US
}
}
}
If someone wants to do a tl;dr on the article text, they could pass it through SMMRY api (which the Reddit's TL;DR bot uses):
Well just download NVDA, and run it through you're RSS reader. It will read RSS feeds for you. I don't think that's what you're getting at, but it will do that, just as it does other things.
It would be like a cross between RSS reader and podcast app. So I can listen to my favorite blogs instead of reading them. And the list would update from the RSS feed. It would probably require some text cleaning like [1] and a good text-to-speech API.
You will not win against SSRF using a blocklist of names. You need a comprehensive solution that is designed for this and will check the end address after redirections, DNS resolutions, ... have been applied in whatever library you use for HTTP. An example is advocate for Python https://github.com/JordanMilne/Advocate
This is cool! I recently tried to use VoiceOver on my iPhone and it’s… astonishingly difficult to use for just text to speech!
One minor request: I realize the form page itself is very minimal, but this would be much more usable on a phone if you set the font-size of the input to at least 16px, and a reasonable viewport meta tag eg `<meta name="viewport" content="width=device-width, initial-scale=1">`
Not certain, I hear some saying it still has an effect on at least iOS Safari <https://news.ycombinator.com/item?id=25629377>, though I’m not sure whether that’s true of current iOS Safari any more, or just slightly older versions that are still in use. I’d love to have someone test carefully and spell out exactly what it achieves, and for what versions.
So yeah I use VoiceOver to navigate my phone on a daily basis, but I'm not a fan of reading things with it. If I'm going on a trip or something, I'll put the Kindle app on my phone.
Otherwise I use NVDA on my desktop for that. It's much more efficient for reading long things.
But if you put https://example.com inside of that input on the website index and have JavaScript disabled it simply won't work. There also is no notice that it does not work without JavaScript enabled.
The solution to this is simple and I think that it would not be hard to change the way you are getting the user given URL from your back-end. What I think would make more sense doing is to just make it work using a URL query parameter.
I have something pretty similar that I built some time ago for personal use: listen to articles while I am working on things and can't dedicate my attention to anything else. I've been wondering if I should open source it(shove it all into a docker container that people can use locally). I would have loved to be able to publish it as a service but two things are stopping me: 1 is cost to run this and 2 is potential licensing issues I might face with it. But open source... And it runs pretty smoothly on a CPU(albeit a 14-core xeon and an 11-th gen i7).
Same here, we built a scraper to (more or less) intelligently extract text from websites, I think adding TTS to it would be trivial. I'm guessing the hardware requirements are for the TTS engine? Could the Web Speech API be used to generate the audio locally to bring down costs? Even though you'd lose support for IE and FF.
Exactly what I did, simple web interface I run inside rambox along with all chats and email clients and whatnot, paste a url and it scrapes it and generates the audio to be played in browser(rambox in this case). OK, I do keep a copy of the audio and the text and some other metadata but that's completely optional.
As far as tts, while the cloud speech is unmatched, it's still proprietary and not free. However there are plenty of really good open source solutions which work extremely well(sample from the one I'm using: https://storage.googleapis.com/adocs_g/example.wav). I use it all the time and in terms of resource usage it's completely unnoticeable. But again, I'm the sole user and on pretty powerful hardware so...
That sample does sound pretty convincing. Just out of curiosity, that spec'd out machine is sitting idle most of the time, only to generate some TTS occasionally?
No, nothing else as far as tts is concerned. It is my workstation however so it's doing a million other things simultaneously. Full disclosure, I have no idea how far the tts can be pushed on it but it is single threaded so there is room for more should there be a need for it.
I do not think extra spaces are the issue (since there aren't any in the link). I suspect it is converting fix chunk of words into audio and before the next chunk there is an awkward pause. Still, very soothing and natural voice.
Once in a while I use the speech feature of Chrome on my MacOS to have it read articles to me while I work. It's clunky though, it only works (AFAIK) on highlighted text which gets annoying on really long articles, and there's no pause so if I want to "pause" I need to stop it and then redo the highlight starting from where I paused (again, annoying with long articles).
I think I'll be using your service from now on! Browser/OS agnostic, no highlighting, and pausing!
Nice to haves:
- ability to pick a different voice
- playback speed controls
- maybe load the original article in an iframe below the player to be able to follow along
What people who use TTS on a regular basis usually want is something that gets the job done quickly. This is why natural-sounding TTS isn't generally a good thing.
That being said, there are some people who like it. I'm definitely not one of those people.
Very nice! I tried it out on one of my articles[0] and it worked very well. The only issue I had was that the code samples were spoken. Perhaps ignore <pre> sections?
What if I wanted to listen to say a dynamically updating web page? Of course my screen reader will do that for me, but I'd like to try that with the service just for kicks.
This is timing. I was just looking for a solution for a friend who has very bad eyesight due to accident. They want to be able to listen to news articles and such.
It would be nice if we can choose languages other than English.
They may want to look into using a screen reader to avoid eye strain, or headaches depending on what their condition is. Everything has one built-in at this point.
Windows has Narrator, the Mac and iPhone both have VoiceOver, and Android uses Talkback. For something extra, and more advanced, I definitely recommend NVDA.
https://www.nvaccess.org/
Great free service by the OP btw. We have a feature similar to this in https://narrationbox.com if you are willing to sign up. Almost all URLs work plus we have more than 300 voices to choose from. Disclosure: I am a cofounder.
When I put in https://localhost:443, I get audio for the default nginx page, which matches what the internet archive has for that url: https://web.archive.org/web/20210620003533/http://localhost/
But putting http://localhost returns the audio from per.quest's home page.