Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: YakGPT – A locally running, hands-free ChatGPT UI (yakgpt.vercel.app)
287 points by kami8845 on March 30, 2023 | hide | past | favorite | 118 comments
Greetings!

YakGPT is a simple, frontend-only, ChatGPT UI you can use to either chat normally, or, more excitingly, use your mic + OpenAI's Whisper API to chat hands-free.

Some features:

* A few fun characters pre-installed

* No tracking or analytics, OpenAI is the only thing it calls out to

* Optimized for mobile use via hands-free mode and cross-platform compressed audio recording

* Your API key and chat history are stored in browser local storage only

* Open-source, you can either use the deployed version at Vercel, or run it locally

Planned features:

* Integrate Eleven Labs & other TTS services to enable full hands-free conversation

* Implement LangChain and/or plugins

* Integrate more ASR services that allow for streaming

Source code: https://github.com/yakGPT/yakGPT

I’d love for you to try it out and hear your feedback!




Nice. It took about a minute to clone it, run it, enter my API key, and get started. The speech-to-text worked flawlessly.

Most people can talk faster than they can type, but they can read faster than other people can talk. So an interface where I speak but read the response is an ideal way of interfacing with ChatGPT.

What would be nice is if I didn't have to press the mic button to speak -- if it could just tell when I was speaking (perhaps by saying "hey YakGPT"). But I see how that might be hard to implement.

Would love to hook this up to some smart glasses with a heads-up display where I could speak and read the response.


> Most people can talk faster than they can type

Most people I know type faster than they can talk. Also more accurate. I find talking a horrible interface to a computer while sitting down. On the move it is another story entirely of course.

By the way, chatgpt is not very fast either, so usually I type something in the chat and continue working while it generates the response.

> smart glasses

I just tried that; it works quite well, however, pressing the mic button kind of messes up that experience.


Normal/average talking is ~150 WPM. Average typing speed is about 60-70. Is a 150+WPM a requirement to become anonzzies' friend?


The only person that could type as fast as I can speak and whom I met in real life was an immigration officer taking my naturalization interview. The sound of a keyboard going: trrr-trrr! And he was amazingly accurate too: all unnecessary things that I said for conversation sake were there, and exactly as I said them. But I think my wife would beat him easily though…


Or a really slow talker?

High WPM might be achievable with shorthand though.


The advantage of course is your not tied to a keyboard / desk. So one could potentially be doing Internet research while hiking.


Yes, and that with smart glasses seems interesting.


It wasn't so smooth for me.

I gave up at

Creating an optimized production build ...TypeError: Cannot read properties of null (reading 'useRef')


Oh, my install failed at:

    Failed to compile.

    pages/index.tsx
    `next/font` error:
    Failed to fetch `Inter` from Google Fonts.


    > Build failed because of webpack errors
Apparently because it can't fetch a font from Google. There should be assets that are critical (js/ts code, templates,css) and assets that are not (freaking fonts) to a yarn build.

edit: hacketyfixey, let's punch the thing in the face until it works:

    ./pages/index.tsx:
    2:  // import { Inter } from "next/font/google";
    12: // const inter = Inter({ subsets: ["latin"] });
(I am sorry)


Haha, I'll set up a docker image that people can pull down!


Thanks but FWIW, I'd also be interested in why it doesn't build. Shouldn't yarn/npm/gulp/whatever manage dependencies ?


I've not found a dependency manager that works reliably across multiple operating systems and operating system versions.


I did, just not in the JavaScript ecosystem.


I tried it, it looks good! I had to modify the code to accept 8000 tokens for chatGPT. It would be good if it saved the json payload of the responses as well.

It uses 2 external calls to a javascript CDN for the microphone package and something else. It would probably be best if it was localhost calls only since it handles an API key


What'd you modify I'm curious?


I love the concept of this and other alternate ChatGPT UIs, but I hesitate to use them and pay for my calls when I could use chat.openai.com for free.

Any chance you could integrate the backend-api, and let me paste in my Bearer token from there?


Hey! I definitely understand the reservation. This is definitely me as well. My reasons for using the UI at this point:

* GPT-4 is decently faster when talking straight to the API

* The API is so stupidly cheap that it's basically a rounding error for me. Half an hour of chatting to GPT3.5 costs me $0.02

Would be curious what you mean by integrating the backend-api?


GPT-3.5 is really cheap (prompt and completion = $0.002 / 1K tokens), but GPT-4 is around 20 times more expensive (prompt = $0.03 / 1K tokens + completion = $0.06 / 1K tokens).

But the benefit from using the API is that you can change the model on the fly, so you chat with 3.5 until you notice that it's not responding properly and, with all the history you have (probably stored in your database), you can send a bigger request with a probably better response once with GPT-4 as the selected model.

I really wish the interface on chat.openai.org would allow me to switch between models in the same conversation in order to 1) not use up your quota of GPT-4 interactions per 3 hours as quickly and 2) not strain the backend unnecessarily when you know that starting a conversation with GPT-3.5 is efficient enough until you notice that you better switch models.

OpenAI already has this implemented: When you use up your quota of GPT-4 chats, it offers you to drop down into GPT-3.5 in that same conversation.


Sure, but GPT-4 through the UI costs $20 per month, which is a lot of api calls.


Isn’t it 10 per hour?


25 / 3 hrs


How is it that cheap?! I ran three queries on langchain yesterday with two ConstitionalPrompts and it cost $0.22 - made me realize deploying my project for cheap could be expensive quick.


GPT3.5 Turbo pricing is 10k tokens or ~7500 words for $0.02. Though note that every API request includes the entire chat context and charges for input & output tokens. https://openai.com/pricing


You need to check which model you are using, also... LangChain runs through the model several times with increased token count on each successive call.


Yeah I assumed it would be doing several times but still more expensive than OP mentioned. I think the issue is I'm using davinci-003


Yeah, davinci-003 is gonna be gpt3, which is more expensive than 3.5.

One more anecdote: I've been running a half dozen gpt3.5 IRC bots for a few weeks and their total cost was less than a dollar. A few hours of playing around with LangChain on gpt3 cost me almost $4 before I realized I needed to switch to 3.5, though even then it still uses a ton of tokens every chain.


Thanks, I'll do that later


I’d love to see a comparison of the average cost to use this with the OpenAI API versus subscribing to chat-gpt plus.

Maybe I’ll have to try this for a month and see if it end up costing more than $20. Thanks for creating it!


Wow! Is it really that cheap? GPT4 is much more expensive, I imagine?


GPT-4 is decently more expensive -- I personally really like & use the therapist character a lot. In this scenario the session would cost me less than $1 which is still much cheaper than any therapist I've used previously :)


What is your setup?


You can try the extension I built [0] which uses your existing ChatGPT session to send requests.

[0] https://sublimegpt.com


The overlay option is great .. Any chance for a firefox version?


Remember that using the API comes with privacy guarantees that using the chatGPT site does not. tldr; anything sent through the API won't be used to train the model and will be deleted after a month.

https://help.openai.com/en/articles/5722486-how-your-data-is...


This is a good point I'll add!


> Run locally on browser – no need to install any applications

That's not what "run locally" means. This isn't any more "local" than talking to chatgpt directly, which is never running locally.


Hey, run locally in this case means: YakGPT has no backend. Whether you use the react app through https://yakgpt.vercel.app/ or run it on your own machine, I store none of your data. I will try and make this wording clearer!


In that case you're basically offering a browser-based client. 'Locally' strongly suggests this is running entirely on the machine (vs. making API calls). Going to break a lot of hearts out there with the wording as it is.


It is more local than talking to chat GPT directly. Open AI stores all your requests on their server. This saves it on your computer. The title also claims it's a UI which always, for now, runs locally.


Honestly your "idea generator" blew my mind. Would love to see a section that includes a larger catalog of prefilled prompts.

I'm thinking: What would a GPT project manager do? What would a GPT money manager do? What would a GPT logistics manager do? GPT Data Analyst, Etc.


> Run locally on browser – no need to install any applications

> Please enter your OpenAI key

...

Do people just not get it?

I would in fact rather give all my company secrets to this random dude than OpenAI.


There are instructions on how to run the GUI from localhost, and the title and even the phrase that has the link to their own hosting tell you you can run it locally first.

It seems they are genuine, and they phrase it exactly as it is. The only thing I would have maybe wanted to see in the title is "open-source" or free software.


Everything still gets sent to OpenAI. “Locally hosted” means the UI, not the AI.


OP already makes it clear that they are just a front-end.


Love the idea of prompt dictation. Taking that idea a step further, would it possible to have a feature where ChatGPT responses are spoken back to the user?


War Games


"Do you want to play a game?"


This is fast. And talking to it is a nice touch. Consider adding text to speech too :)

One feature I am missing from all these front ends is the ability to edit your text and generate new response from that point. Official chat gpt UI is the only one that seems to do that.


Chat-with-gpt has that, we use it in our org as an alternative chatgpt Interface: https://github.com/cogentapps/chat-with-gpt


In official UI, if you edit a message and get a new response, you can still always go back to any of your previous messages and continue from there on. Basically the history is like a tree in official UI. History in all other frontends including this one is linear.


I've never seen this one before. It has several features I've been looking for. Has it been working well for your organization?


It has, especially since we don't want to go through the accounting nightmare of buying everyone ChatGPT+ accounts, so just inviting everyone to the OpenAI org and giving out API keys to be used in tools like this one has been good.


Good to know, thank you.


I added whisper to that (was merged) so you can talk to it as well.


In official UI the chat history is like a tree. If you edit a message, it branches off the conversation from that point. You can always go back to any message in the tree and see the conversation from there on. Can you do that in your UI? No UI has done that so far.


I am not the author , just a contributor, but it would not be very hard to add.


Hey! You can edit past messages you've submitted and they will generate a new response that overwrites whatever happened in the conversation previously. If you're talking about a tree-like struct where you can have different branches, then true, only the official UI has it AFAIK :)


Looks cool! Are you planning on adding more customization to be able to influence the AI? See https://bettergpt.chat/ (it's also open source and uses API in the browser). Basically with that frontend you can control the role of all messages (e.g. add system messages) and also edit them all to better influence the AI in some cases.


Editing the prompts (which are currently submitted via the system message similar to your linked app) is a great idea. I'll add it to the to-do list :)


BRO. Your transcription is SO fast. I've hacked at a similar project passing to the Whisper API and honestly I was already blown away with its speed and accuracy (as was anyone I showed it to), but your implementation is so much faster both in TTS as well as the response from their API. I will absolutely use this.


Very cool. I use a custom local UI as well, based on a fork of a similar project called ChatPad (https://github.com/deiucanta/chatpad). That also uses Mantine UI, and lets you create and save prompts just like chats. Data is stored locally using indexdb. I embedded it in an electron app, which lets me run it from my dock rather than a terminal. But what's missing is speech-to-text, so it's great to see this project has that.

There are a few drawbacks to local, I've discovered. For example I doubt the new plugins can be extended to beyond ChatGPT's web UI. Also, it doesn't stream response tokens as they're generated, which is a pain. I haven't looked into whether OpenAPI let you do that though.

Nice work!


Looks great. Super interesting to browse other peoples code. I'm working on a desktop app for ChatGPT.

https://github.com/EzzatOmar/delegate


Given that Vocode (realtime audio, llm, etc) came out a few days ago, could you speak to how yours compares to it?

https://github.com/vocodedev/vocode-python


So, is time, finally, to entertain spam callers with nice, polite, _long_ conversations? About my credit card numbers and passwords to my accounts? My personal record is 40 minutes - some nice guys were trying to install a remote controlled door on my MacBook and were thinking they were very close to success. There are existing services, like https://jollyrogertelephone.com/ - but they are not as good as me. Still, using myself to entertain the robocallers is fun, but expensive, it would be interesting to see if AI is ready to help here…


Cool! I tried out the speech to text and it was instant and accurate, i had no idea whisper was that good.

Do you know their privacy for our voices? Do they train on it, hear it, etc ?


if you're running it locally, they don't and cannot.

if you're using the hosted whisper, they can. however, they don't specifically talk about it.


I absolutely love this! The UI is nice and responsive and this is the first chatGPT UI that has voice recognition that works outside of chrome!

I kind of want to throw this up on a server for my housemates to use, I am currently the only person with a openai account, so I would like the ability to embed my API key. Minor feature request :-)


Hi ChatGPT! Let me register using my personal information, then tell you what my tasks are at works, what I'm interesting in, what I'm struggling with in life and a bunch of other sensitive personal information. I trust you completely, and am sure a nice AI such as yourself would never use my personal data for anything.


Barking up the wrong tree, this post is for a thirdparty tool.


The only thing I'd suggest to consider to add is some sort of authentication. If I deploy this on a server so I could reach it with my mobile, on the go, and it has my API credentials, I wouldn't want anyone who stumbles upon the page to be able to interface ChatGPT on my expense.

Otherwise, it really looks good.


Your API key and chat history are stored in browser local storage only.


Well, I'd love to see a change there as well, then: if I'd want to share the interface with my family I wouldn't want to reenter everything everywhere they might be accessing the page from.



I've been playing around with your Idea Generator persona for the last 15 minutes and have been absolutely blown away. Excellent prompt engineering.

As mentioned by others, it would be great to customize or write new personas/prompts.

Also could you add a voice chatbot as well using vocode? It could be an alternative UI for each of the personas.


So if you add audio output to it so I can talk to my computer like in Star Trek, I'll venmo you $100. Then, I want to have a command line module so I can ask it to write files to the local disk and run them, so I can deploy code it's just written to AWS, that's worth at least another $100.


It's not that hard to do, but I think this is lowballing. If you want a talented programmer to do something for you, you should be willing to pay them $150/hr. And I'm assuming this is more than an hour of work.


It would be great if I can just enter "space" in the app and it just lets me talk to it. Keyboard shortcuts!

BTW I have a lot of these ChatGPT UI apps installed, mostly free and open-source. Perhaps this is really the era of going back to just talking to a chat interface like the old times.


This is very well made and designed. I will likely use this instead of the actual Chatgpt UI since their API is a lot cheaper than the 20$/month pricing for my usage amount.

Interesting note: I tried speaking mandrain chinese to the mic and it auto translated what I said into English.


Just tried this in both English and Korean. Fumbled a bit with voice control but worked well once I got it going. Very nice. Korean prompts got translated to English so had to tell ChatGPT to respond in Korean to get full non-English UX.

Well done.


It sounds like a nice modifier to add a one liner to the prompt "Return your response in $user.language"


It's pretty bad to ask people to enter e private secret key in a web site (any, I mean)


They provided an option to build it locally and run it yourself. But yeah, I wish there is a common proxy protocol that would allow website accessing private resources without exposing private keys


OpenAI should implement an oauth authorization server and allow developers to use "Login with OpenAI account" into their apps.


I agree, this is the best solution, I'm sick of countless projects with key input fields where I have to go to Ctrl/CV every time.


Not to mention the ludicrously gaping security issue that this is. My guess is they want to push people to the plugins tho.


Maybe a small video demo would be an ok alternative?


What alternative would you suggest for a free service that depends on OpenAI APIs? It's easy enough to generate an API key for this service and delete it afterwards.


Why? OpenAI keys can be revoked at any time, and OpenAI allows you to set soft and hard limits for billing as well.

You can also generate multiple keys, so if one app misbehaves, you don't need to rotate all the keys, just the one that misbehaves.

This is assuming the API keys can only do generation. If it can access billing details or something it's very different of course.


> Why?

Because it's bad practice to provide sensitive information to untrusted sources, and if you are an ethical developer, it's an anti-pattern to write software that encourages bad practices.

Your credit card company will reverse any authorized charges. Will you email me all your credit card info?


If I could generate a credit card number just to send you money then yeah sure.


> It's pretty bad to ask people to enter e private secret key in a web site (any, I mean)

I answer back to myself: I miss-understood since the idea of the developer is to run it locally http://localhost:3000 while I got scared from the DEMO

Congrats to the developer!


I installed it locally about an hour ago and have been running it through some paces. Nice work! (In addition to the predefined prompts, I like the API usage meter at the top).

(now, I just need Openai to take me off the waitlist for GPT-4)


I’m a bit confused, I tried to utter some queries in Esperanto and French and it transcribed English (fine) translations. Can I disable this behavior to have the text transcribed in the language uttered?


I might be missing it but do we have an idea about the prompt that ChatGPT uses so we can replicate the experience?

I haven't played with the OpenAI API yet. Is there examples of good prompts to use to get good responses?


Love this, Few things we could add: - Search Feature - Way to import/export chats - Star/Favourite replies by ChatGPT - For GPT4 provide 8k/32k model variations - Prompt Dictionary


I get a 404 error in the browser console for http://localhost:3000/encoderWorker.umd.js


This is exactly what I need, thank you for building this! We're using Azure cognitive services for API access to OpenAI models though. With any luck, expect a PR today for basic Azure support :)


Could I hook this up to one of text-generation-webui's API formats?


would be so fun if you could fork a project on vercel i.e this project has a button to fork: - which forks its github - makes a new project on your vercel cause it's connected to your github - it opens a new tab with your project running.


Isn’t GPT a trademark owned by OpenAI? Is it legal to use it?


Looks like they've recently applied for the trademark but they haven't got it yet. I have no idea if they will get it or not, it is just an acronym but they did come up with it.


They did seemingly position it as a generic name for this style of AI model, and other people have been using it in that fashion (eg "gpt-j"). It's usually recommended to contrast a brand name for your product with its generic name, so that the two don't become confused. Hence why scrabble is always subtitled "crossword game".


Agreed. I doubt that OpenAI's recent application seeking to trademark "GPT" will be approved. Maybe specific models/products, but not just "GPT" by itself...

To be able to register a trademark in the U.S., the applicant has to show that the proposed trademark is in fact "distinctive" of their company. The more generic a term is in its field, whether to begin with (i.e., by not becoming distinctive in the first place...), or over time (i.e., by failing to maintain its distinctiveness), the less likely it is to be registerable. And, such "distinctiveness" is notably harder to achieve and/or maintain for terms that are more generic/descriptive rather than truly unique…

In the case of "GPT," in the context of software (specifically A.I.), those letters -- particularly in that combination -- are understood to stand for things that refer to a kind of A.I. language model having certain characteristics, even though OpenAI was first to produce a (g)enerative (p)retrained (t)ransformer and they're still the most notable provider of such technologies.


What's the use-case for this instead of the default UI?


Cross-platform compressed audio record!? How!?


> Run locally on browser – no need to install any applications

This seems to be a contradiction. Am I running it locally, or is it running on someone else's server?


It simply means calling GPT APIs locally.


Kind of a gross misuse of the term, isn't it?


speech to text didnt transcribe text after a minute. recording was 5s long(((


yeah unfortunately the OpenAI API hangs sometimet


All your prompts are belong to us


Make it easier to try


Hey! I would love to. I seriously considered adding my own key into the app, and implementing some rate limiting to e.g. allow you to send 3 messages for free. But unfortunately that would require me to store some backend data on you that I do not want: I want this to be a completely "private" / FE-only application that stores no data on anyone.


Testing YakGPT right now, excellent work! I would recommend adding some screenshots to the GitHub README so that people can get an idea of how it looks before entering their API key.


Before you comment something like this, ask yourself "How would I make this easier to try?" The only reasonable answer is providing the OP's own API key, which is undesirable.


A video demonstration, cleaner example of what it is, etc... You can experience it by observation


could you please add some screenshots of how it looks


will do!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: