I agree. Kudos to the author for sharing a working example of using the OpenAI's PHP Whisper client though. Digging a bit deeper into the organization that released this seems to provide more context: https://beyondco.de/. It appears this is Laravel oriented.
The objection here is more structural than technical. The famous dropbox objection is 'anyone could do this' - even though they might not have the wherewithal to do so. The objection here is that the open source project is relying on a closed source paid service to do all the heavy lifting.
Someone is going to need to foot the bill, which means this project will eventually have to answer some tough questions about funding, and what the project actually delivers.
It's disingenuous because literally none of the code transcribes or translates audio.
This is NOT an app that transcribes, or translates, audio.
This is a front end to another companies service.
In its defense, it is a useful front end, because getting whisper running locally was a pain in the butt because of py-torch's specific python requirements (not too old, not too new... juuuuust right).
This app also looks like it does very useful things with what whisper outputs.
But it is 100% disingenous because it does none of the things it markets itself as doing. I was expecting it to run whisper locally, not call out to a paid service.
Download Whisper and the models, run it in a Docker container as a server, and it's Open Source.
Honestly, try see it as a favor that it's using OpenAI's endpoint, since some of us won't think it's feasible to have a GPU-loaded server running 24/7 just for some occasional transcriptions.
This is a really bad comparaison. Expedia didn't build their services in a way that makes the users think the hotel they are booking belongs to expedia. No one is going to buy an Air-France flight from them and expect the plane to be flown by Expedia employees
I think that's fair, but i also thing that it's mostly just musicians that would ever think that. I don't think the average person (geek or not) would assume that. I'm a musician and I didn't think it'd write sheet music.
As a geek whos done basic music arrangement, I also know that that's an incredibly hard problem once you introduce modern instruments. even staying with just classical ones differetiating between a violin part, and i a viola? or even a cello playing a high note vs a viola? Like... wow. that would be SO hard.
We're barely getting words right. I don't think there's any way we're anywhere close to transcribing a full band or orchestra in a meaningful way. Extracting the melody? sure. Chord changes? Sure. Actually doing an accurate and even remotely complete transcription? Incredibly hard.
If there was software I could just dump an MP3 into and get a basic chart with chords and a melody...that'd be pretty amazing. I've done it, by hand, and the results were even published... it ain't easy. 30 minutes of moderately complex pop rock took a couple of months off and on.
A better description would be "A PHP based web app which calls OpenAI's Whisper API to transcribe speech"