Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Whisper.cpp and YAKE to Analyse Voice Reflections [iOS] (apps.apple.com)
6 points by ArminRS on Feb 20, 2023 | hide | past | favorite | 3 comments
Six months ago, I went full-time indie, but I haven't released anything so far. The products just never felt good enough for me to publicly say this is what I'm doing now.

To get out of this mindset, I decided to make an app for myself in a week, add monetization, release it and move on. The app idea was simple: Reflect on your day by answering the same four questions out loud. The answers are transcribed and with regular use you can see what influences you the most and take action. All on-device, as otherwise I wouldn't feel comfortable sharing my thoughts.

I had all core features working within a day by simply modifying an existing example app. However I was dissatisfied with iOS's built-in offline transcription due to a lack of punctuation and the speech recognition permission prompt that made it seem like data would leave the device. Decided to use whisper.cpp [0] (small model) instead. This change, lead to many others, as I now felt too little of the app's code was mine. e.g.:

  - Added automatic mood analysis. First using sentiment analysis, then changed to a statistical approach
  - Show trends: First implemented TextRank to provide a summary for an individual day, then changed it to extract keywords to spot trends over weeks and months. Replaced TextRank with KeyBERT for speed and n-grams, then BERT-SQuAD, and ended on a modified YAKE [1] for subjectively better results. (Do you know of a better approach?)
As a result, this tiny app took me over a month, but it still has its flaws:

  - Transcription is not live but performed on recordings, so if you immediately want the transcript of your most recent answer, you have to wait.
  - Mood and keyphrase extraction are optimized for my languages and way of speaking, so they might not generalize well.
  - Music in the background can result in nearly empty transcripts.
Nevertheless, after using the app regularly and enjoying it, I feel ready to release. Hope you will find the app useful too.

[0] Show HN: Whisper.cpp https://news.ycombinator.com/item?id=33877893

[1] YAKE: https://github.com/LIAAD/yake

It's a great idea to give people a chance to reflect. Really like this! Technically, I"m just wondering, the app says 22 languages transcribed, offline, and it's ~480Mb in size. I thought only the large model (> 1Gb) had more than English (and the models don't compress). And when you say transcribed do you mean transcribed to English?

BTW - to modify an existing app, can you just fork the repo and open a new Xcode on that fork? I don't know iOS but I'm getting into MacOS dev and I like this idea of modifying an existing app, as I've already got so much boiletplate I think I can reuse.

For each size there is a ".en" and a multilingual one. I'm using the multilingual one -> you talk in German and receive a German transcript. On page 23 of the paper you can see the WER's of each language (https://cdn.openai.com/papers/whisper.pdf) I limited my app to the languages that had a max of around 20% WER.

Sure if the repo has a xcodeproj you can just open it in XCode, change the signing and improve on it. (Just make sure to always respect the licenses) If you want to play with whisper.cpp you can use this SwiftUI demo: https://github.com/ggerganov/whisper.cpp/tree/master/example...

Wow, thanks for the info and paper link! You are awesome. Also the assurance on how to fork projects. :) Good luck with your app! I'm also launching my Whisper related voice memo transcription app soon: https://apps.apple.com/app/wisprnote/id1671480366

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact