Hacker News new | past | comments | ask | show | jobs | submit | asaddhamani's comments login

I’ve built two AI based apps. Earlier this year, I wanted clarity on some issues. I knew several of my books held keys to the answers I was seeking, but finding those felt impossible. I wanted an AI based tool I could use to talk to my books. I built https://www.asklibrary.ai to enable this, implementing tons of RAG tech like query fanout, query understanding and breakdown, multi step retrieval, reranking, etc., and I pull in dozens of pages of text for each answer.

Secondly I love Claude and also use TypingMind but missed the memory feature from ChatGPT. I made MemoryPlugin (https://www.memoryplugin.com) that adds long-term memory to ChatGPT, Claude, Gemini, TypingMind, and LibreChat on desktop and mobile browsers. This got me really interested in AI memory in general, I’ve played around with fine tuning AI models with memories (results = some data learned, way more hallucinations).


From a quick try results aren't good. Sounds bland, and the text I type isn't exactly equal to the text that is spoken. Didn't try with voice cloning though.

Why is good TTS so expensive and why are there no good open source options? Is it just from the need for high quality training data? I don't imagine these models are more expensive to run compared to SOTA LLMs, yet they cost so much more.


From what I'm seeing, most of the open source TTS models are trained on the same few voices, mostly in 16Khz, mostly from Librivox books I think.

Eleven Labs is most likely trained on stolen audiobooks, they've published a few Youtube videos in Polish, now taken down, of AI renditions of famous Polish audiobook narrators. This was all before they became popular, and before their voice cloning models were publicly available I think.


> mostly from Librivox books

That probably explains a lot. I've tried listening to some of those audiobooks - very hit and miss, mostly miss. Definitely amateur hour and mostly bad quality.


I had pretty good results with coqui-tts and a VITS model, I trained myself with an open dataset and later with one I extracted from audiobooks / epub and therefore can't publish (german)

The dataset and video tutorials are all available and linked on (also english):

https://www.thorsten-voice.de/en/motivation-vision/


Thanks for mentioning my Thorsten-Voice project, dear sandreas :)


You're very welcome.


a few weeks ago i used piper to create an acceptable translation of a book. i didn't listen to it all, but the result sounded better than anything i was able to listen to before. good enough to listen to a book if a human read one is not available. just a few years ago, this was not the case.

in other words, while FOSS TTS lags behind commercial options, it does get better and i expect within a few years it will produce results that are at least as good as the commercial options today if not fully caught up.


Piper seems roughly equivalent to old-school TTS outputs that sound flat, jumpy with the concatenative approach. Listen to this first example I tried:

https://rhasspy.github.io/piper-samples/samples/en/en_GB/ala...

Of all the TTS APIs I have tried, I like OpenAI voices the best. Haven't considered things like elevenlabs because I find them ridiculously expensive.

I love voice to voice interfaces, but only when they sound natural to my ears, and the current pricing for good ones is prohibitive for a huge number of use cases.


well, i was comparing it to the free tools available a few years ago, and against that, this example is a markable improvement. it's the first that i could actually bear to listen to over a longer period of time. i expect just another few years and this will actually be good.


There are a lot of options. StyleTTS2 is pretty good, XTTSv2 is pretty good, the new E2 TTS and F5 TTS also seem decent.


Commercially available high quality training dataset is the key. Open search libraries don't get the luxury of working with voice actors to record voices.


Would it be hard to create such a training dataset? Seems like you’d just need a lot of people to say a bunch of stuff for you?


needs a crowdsourced model


Ideally, Mozilla would step up here given their mission statement, but they won't, probably because their CEO needs another bonus.


Yeah there's no chance Mozilla would do anything like this:

https://commonvoice.mozilla.org/


That's the first thing I thought of! I wonder how used these are. Are there any sources or data points indicating that this commonvoice data is being used, and if so, where/how? I think I may have contributed to this a few times back years ago. Nice to see it's still going, would be better to know it's being used.


It was used quite a bit of speech to text - but tts it’s not that great.


It costs a million dollar a year to host 32k hours of audio?


Have you tried VoiceCraft?


Yeah all these seem hyper focused on "voice cloning" so on replicate VoiceCraft doesn't even let you try normal TTS unless you provide a reference voice so I noped out.


I believe AI memory is a very important problem to solve. Our AI tools should get better and more personalised over time.

(I hope it's ok to share something I've built along a similar vein here.)

I wanted to get long-term memory with Claude, and as different tools excel at different use cases, I wanted to share this memory across the different tools.

So I created MemoryPlugin (https://www.memoryplugin.com). It's a very simple tool that provides your AI tools with a list of memories, and instructs them on how to add new memories. It's available as a Chrome extension that works with ChatGPT, Claude, Gemini, and LibreChat, a Custom GPT for ChatGPT on mobile, and a plugin for TypingMind. Think of it as the ChatGPT memory feature, but for all your AI tools, and your memories aren't locked into any one tool but shared across all of them.

This is meant for end-users instead of developers looking to add long-term memory to their own apps.


Do note it can take up to 24 hours or drop requests altogether. But if that’s not an issue for your use case it’s a great cost saving.


This is neat, I’ve been looking for a way to run our analytics (LLM-based) without affecting the rate limits of our prod app.

May need to give this a try!


What percentage of requests usually get dropped? Is it something miniscule like 1% or are we talking non trivial like 10%


If a site offers a reasonable priced alternative to ads I'll opt for that. I've donated at other times when that option is available.

Otherwise I don't want to be tracked profusely. Ethics is sorely missing in online advertising.


The best (and only) implementation of this I’ve seen is https://all3dp.com/

If you visit with an ad blocker, they say “please disable your blocker or subscribe for $3/year. Hit the subscribe button and you can Apple Pay and be reading a 100% ethically as free article in seconds.

Obviously transaction costs totally suck at prices that low, but one transaction a year helps I’m sure.


That model sounds great. Low friction and impulse-buy pricing.

There are lots of sites (AnandTech being a prime example) I don't visit often enough to justify the usual monthly subscription cost.

Per-article pricing with no registration would be ideal (yet another cryptocurrency use case that never materialized) but as you say, transaction fees make that a non starter.


I have the next issue of always deleting cookies when the browser closes, meaning I'd get this dialog every time I visited the site. Whitelisting a site in Firefox is relatively annoying and throw in multiple devices, that dialog will always be there.

I don't really have a better idea besides automated micropayments, which nobody has managed yet, crypto doesn't count,, so I guess we'll have to live with the current situation?


This comment (not from you personally, Asad, but the idea of it) is the very core of the reason why I have such an axe to grind on this topic.

One brings this ugly topic up, that ads keep sites running, and are showered by comments of people saying exactly what you said. Those comments get praise and lots of upvotes. Everyone pats themselves on the back.

But when you are on the other side of the equation, the one dependent on ad views and/or subscriptions, the numbers unequivocally show that people are totally full of shit. That they are just virtue signalling to receive praise and to push the skeleton back in the closet.

Again, not calling you out personally, I believe you do support creators. But I have done this song and dance many many times, and it always goes the same way.


Also, back in the day, some of us had a fair number of magazine subscriptions. But, really, at peak it was a small percentage of the number of websites I look at at least now and then. Consumption has generally changed and most of us are skittish about subscriptions generally even if we have a few.


The whole mode of taking in trade news has changed. 20 years ago when i bought a Maximum PC i read it cover-to-cover. Can't imagine doing that now with anything other than a book or a movie. Instead i'm reading the one or three most eye-catching articles that twenty different publications put out. Our much-beloved RSS (and old-school email newsletters) were the start of the slide here i think.

I still have a few subscriptions, especially if they send it out on a dead tree, but with the nature of the internet it's hazardous to not use an ad blocker. I've come to appreciate when publications run reminders that they are, in fact, also people who need to eat, and i try to make up for what i take from the trough by buying swag or sending a check if they take donations. But i get that there's not an enviable business plan on the other side of that equation. It's an ongoing evolution.


> Our much-beloved RSS (and old-school email newsletters) were the start of the slide here i think.

I'd place the shift happening earlier with early web portals. People made (or were coerced by their ISP) web portals their home page. The model of portals was show people headlines with direct links to the articles.

Hyperlinks are fundamental to the web so it's not like portals were doing something bad. It is just a model that's difficult to monetize for the destination site. More difficult than a traditional magazine or newspaper since the site only gets paid per actual impression vs paid per square inch from potential impressions estimated by circulation.

RSS readers were more about the democratization of portals since a site feed let the end user build their own "portal" from their collection of feeds. In terms of traffic patterns an RSS user was pretty similar to a web portal user, just a visitor that dropped in on some deep link and didn't necessarily hit any additional pages.


It's not your customers' fault that your business model is not viable, and guilting people into turning off AdBlock is manipulative and detrimental to overall human productivity. Asking people to watch ads is simply a bad trade off, in the same way that burning trash to save on fuel is bad -- to save 1$ in fuel costs, environmental damage in the thousands is caused. To make 1$ from ads, many multiples of damage in lost productivity and bad product proliferation are caused.

Ad based businesses are as bad as door to door life insurance scammers, multi level scammers, etc.

In short, find a job that doesn't require damaging other people.

/Forgot to mention, watching ads without buying the advertised product simply decreases ad yield over time and therefore it even wastes productivity for 0 return in the long run./


The virtue signaling part of online tech discourse is probably my biggest dissatisfaction with it these days. I hope you're using Kagi because Google is unethical oh and using Matrix because Discord is evil oh but you're using Gemini because the web is all cursed and sorry you're using Signal for your private communications right? Twitter is evil now Mastodon right? Hope you aren't using Reddit but Lemmy. "Enshittification!!"

Meanwhile the numbers show where the users actually are. I pay for YouTube, Telegram, and Nebula, self host Matrix, use and run Bluesky infrastructure, and a few other things but I'm the first to admit I'm in the minority. Not only that but it's time consuming! Meanwhile in tech discourse everyone is using Kagi for everything and "it's a breath of fresh air" or whatever.


That's why the saying "actions speak louder than words" exists...

In any marketing research it is well-known that what people say they would pay for and what they actually pay for are two different things. Hence also the mantra about MVPs and going to market as soon as possible.

But specifically on AnandTech and "written journalism", I think they are right about the "written" part. These days the topic and hardware reviews are all over Youtube.


A huge part of this is because there is often no other option to pay, and when there is there's a ton of friction involved. We know how much little frictions add up when people are trying to buy a product. They have to have even more impact on someone who wants to donate. I definitely spend more on my Apple devices due to easy Apple wallet integrations. I'm not going to pretend like I'll go out and start donating to all of these websites. But if the anti-popup blocker modal had something as easy as an Apple Pay button, I'd definitely consume more of that style of content if the fees were reasonable.


I'm not sure what skeletons you think are being pulled out of the closet. I do the same as the OP, if there is an option to pay I do that, but I will always ad block. I feel for you if you can't make money without ads, but I'd rather see the world burn than be ad driven.

I pay for many many subscriptions for content I like. Also, I don't see any "virtue signalling" anywhere. I don't want ads because they are hostile and not in my best interest. They significantly lower the quality of my life. It's as simple as that.


You cannot see the virtue signalling unless you see the traffic metrics and revenue sheets.

Everyone says they pay to support, very few people actually do.

Just look at how it is a matter of course to post an archive.is link anytime a pay walled article is posted. It's so pervasive and wide spread that people don't even think about it.


> the numbers unequivocally show that people are totally full of shit

What numbers?

Where can I pay to replace ads with something that isn't orders of magnitude more expensive? Basically any single-site subscription I've seen fails that test. If you're citing that kind of subscription, then that evidence doesn't work here.


The only one I've found that passes my test (no ads if you subscribe, and equally important, all the tracking crap is also gone), is ArsTechnica. I check the stories several times a week, so I'm happy to subscribe under those terms.

For every thing else I use adblockers.


[flagged]


Ha, you're assuming GP visits Youtube. (I don't, rather in this camp: https://news.ycombinator.com/item?id=41400286 )


Their online pre-recorded courses cost 1000s of dollars.


I can imagine that's never going to work.

Very few people can afford it, and those who can afford want more, e.g. Q&A, graded assignments etc. It's almost never a "record these once and we are done" thing.


Getting adequate good quality sleep is the first place to start. Track your sleep, maintain low sleep debt.

Intermittent fasting helps with a lot many things, portion control, snacking, insulin sensitivity, mental clarity.

Resistance training is essential. I go to the gym and play around with weights. Having a personal trainer makes a huge difference here. I go first thing in the morning 5 days a week. Either that or I don’t go. Early morning workout also makes me feel like the rest of my day is easier in comparison to the literal heavy lifting I’ve started my day with.

For cardio, you want some kind of explosive high intensity workouts and some sustained medium intensity ones. Resting heart rate is a decent way to measure your improvements here and I’ve noticed that the higher my heart rate goes during workouts, the lower it goes during rest.


Any idea how low resting rates can go? And is it always a good thing or can it be too low?


https://www.heart.org/en/health-topics/high-blood-pressure/t...

60-100 is considered a normal resting heart rate, generally the fitter you are the lower you'll be in that range. Outside that range (above or below) is a potential health concern, but if you're physically fit a resting heart rate below 60 is not an immediate cause for concern. 40-60 can be a normal resting heart rate for physically fit individuals. You can always contact your doctor if you're concerned about it.

I mostly know these numbers because my resting heart rate was down to 45 for a few years and I did talk to my doctor about it.


I find the Waking Up app the closest to this. Others like headspace feel heavily commercialised, with an aim of improving “productivity”. If anyone is looking for guided meditation, I suggest checking out waking up.


I dont think the original series of guided meditations from the headspace founder was aimed at productivity. From what I remember, it's a pretty typical breath- and bodyscan-centered vipassana style. I can't comment on any of the subsequent instructors or lessons though, haven't tried them


> Others like headspace feel heavily commercialised

The one you pointed to is $130 a year.


The Waking Up app has up to a 100% scholarship rate for the subscription with no questions asked (you can pick your rate if you so choose).

Their philosophy on monetization is that they don't want anyone who can't afford it to be excluded, but are still putting out a valuable service and it's fair to charge money for it.

I'm a very happy paying user of the app, and I deeply appreciate their willingness to give it away to people who can't afford it.


Their point is that Headspace seems to sell meditation a bit more as a means to optimizing one’s life, while Waking Up sells meditation more abstractly, as a means to more fully live one’s life.

The app is free (or heavily discounted) for those that can’t pay. Nor do they seem to optimize conversion flows, run ads, or pursue growth as aggressively as Headspace has.


I got the same impression at first. Then I saw the drawing tools and I thought that’s a lot to ask from end users. Messaging is confusing for sure.


This is really cool, thanks for sharing!


TLDR is use mistral models on deep infra. Maybe now Mistral updated their prices too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: