Hacker News new | past | comments | ask | show | jobs | submit login

The title seems quite disingenuous.

A better description would be "A PHP based web app which calls OpenAI's Whisper API to transcribe speech"




I agree. Kudos to the author for sharing a working example of using the OpenAI's PHP Whisper client though. Digging a bit deeper into the organization that released this seems to provide more context: https://beyondco.de/. It appears this is Laravel oriented.


The main reason people add the tech stack is for marketing reasons.

The title describes what it does, I think you're making a mountain out of an anthill.


why php though, couldn't the whole thing not be completely running in the browser?


Many people on HN infamously called Dropbox just an rsync script, right?

It's usually all in the details and delivery (and ya'know we're lazy and lack time to setup stuff locally)

Though I wouldn't really knock anything free and open source either way.


The objection here is more structural than technical. The famous dropbox objection is 'anyone could do this' - even though they might not have the wherewithal to do so. The objection here is that the open source project is relying on a closed source paid service to do all the heavy lifting. Someone is going to need to foot the bill, which means this project will eventually have to answer some tough questions about funding, and what the project actually delivers.


Whisper is open source.


Where can I download the source from?



This is not open source. The wrapper may be, but it's using a non open source cloud service.


This thread is about the wrapper, which is open source.

You can run Whisper locally, and it is open source.

Feel free to fork this open source project and adapt it to a locally run Whisper instance.


[flagged]


Please don't break the site guidelines like this, regardless of how wrong someone is or you feel they are.

Rather, please make your substantive points thoughtfully and without name-calling or swipes.

https://news.ycombinator.com/newsguidelines.html


It's disingenuous because literally none of the code transcribes or translates audio.

This is NOT an app that transcribes, or translates, audio.

This is a front end to another companies service.

In its defense, it is a useful front end, because getting whisper running locally was a pain in the butt because of py-torch's specific python requirements (not too old, not too new... juuuuust right).

This app also looks like it does very useful things with what whisper outputs.

But it is 100% disingenous because it does none of the things it markets itself as doing. I was expecting it to run whisper locally, not call out to a paid service.


Download Whisper and the models, run it in a Docker container as a server, and it's Open Source.

Honestly, try see it as a favor that it's using OpenAI's endpoint, since some of us won't think it's feasible to have a GPU-loaded server running 24/7 just for some occasional transcriptions.


This is a really bad comparaison. Expedia didn't build their services in a way that makes the users think the hotel they are booking belongs to expedia. No one is going to buy an Air-France flight from them and expect the plane to be flown by Expedia employees


....only on HN


I would expect “transcribe any audio” to mean music transcription, Personally.


I think that's fair, but i also thing that it's mostly just musicians that would ever think that. I don't think the average person (geek or not) would assume that. I'm a musician and I didn't think it'd write sheet music.

As a geek whos done basic music arrangement, I also know that that's an incredibly hard problem once you introduce modern instruments. even staying with just classical ones differetiating between a violin part, and i a viola? or even a cello playing a high note vs a viola? Like... wow. that would be SO hard.

We're barely getting words right. I don't think there's any way we're anywhere close to transcribing a full band or orchestra in a meaningful way. Extracting the melody? sure. Chord changes? Sure. Actually doing an accurate and even remotely complete transcription? Incredibly hard.


Orchestration is it's own ball of wax.

If there was software I could just dump an MP3 into and get a basic chart with chords and a melody...that'd be pretty amazing. I've done it, by hand, and the results were even published... it ain't easy. 30 minutes of moderately complex pop rock took a couple of months off and on.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: