More

xyc · 2024-09-21T23:53:07.000000Z

If anyone is interested in trying local AI, you can give https://recurse.chat/ a spin.

It lets you use local llama.cpp without setup, chat with PDF offline and provides chat history / nested folders chat organization, and can handle thousands of conversations. In addition you can import your ChatGPT history and continue chats with local AI.

giancarlostoro · 2024-09-22T04:53:44.000000Z

I dont see any indication that it runs on Linux, I'll stick to Jan which is free.

xyc · 2024-09-17T01:06:12.000000Z

Privacy could be one reason. There are a lot of cases where people do not want to send data to a cloud service.

xyc · 2024-08-29T18:10:05.000000Z

Cloudflare has it https://developers.cloudflare.com/workers-ai/models/llava-1....

Locally it's actually quite easy to setup. I've made an app https://recurse.chat/ which supports Llava 1.6. It takes a zero-config approach so you can just start chatting and the app downloads the model for you.

xyc · 2024-08-29T18:13:45.000000Z

Just realized I read your blog about Llava llamafile which got me interested in local AI and made the app :)

What's your reservation about running it locally?

xyc · 2024-08-27T19:10:40.000000Z

VS Code vs XCode situation is an exact counter example of this. Non-optimized native apps could be that much slower than well optimized Electron apps.

Pet_Ant · 2024-08-27T20:25:15.000000Z

That's not an apt comparison because it could be the non-UI portion of that. VS Code would be even faster if it was a native app.

xyc · 2024-08-27T23:15:07.000000Z

Not really. VS Code does have some performance optimizations where even the web browser optimization wouldn't suffice, for example it implements its own scroll bar instead of using the web native scroll bar. But for the most part the browser render optimizations is the crucial factor. After years of optimization you can't easily beat a web browser.

Native app is just another set of layers of abstractions. As a comparison, SwiftUI doesn't render 500 items quickly enough (https://www.reddit.com/r/swift/comments/18dlgv0/improving_pe...), which is a tiny number for web.

xyc · 2024-08-26T04:49:53.000000Z

A user's personal data really does not have that much scale. Worst case they can cache everything locally. I've imported thousands of chat sessions into a local AI chat app's database, total storage is under 30MB. Full text search (with highlights and all) is almost instant.

xyc · 2024-08-26T04:44:32.000000Z

Check out https://recurse.chat (I'm the dev). You can import ChatGPT messages. It has almost instant full text search over thousands of chat sessions. Also supports llama.cpp, local embedding / RAG, and most recently bookmarks and nested folders.

xyc · 2024-07-05T05:31:33.000000Z

You can run llama.cpp and structured output with GNBF. There are tools to convert JSON schema to GNBF.

xyc · 2024-05-27T06:30:53.000000Z

I have been using local LLM as a daily driver. Built https://recurse.chat for it. I've used Llama 3, WizardLM 2, Mistral mostly, and sometimes just trying out models from hugging face (Recently added support for adding it from Hugging Face https://x.com/recursechat/status/1794132295781322909)

xyc · 2024-05-13T22:29:33.000000Z

Seems that no client-side changes needed for gpt-4o chat completion

Added a custom OpenAI endpoint to https://recurse.chat (i built it) and it just works: https://twitter.com/recursechat/status/1790074433610137995

swyx · 2024-05-13T22:31:31.000000Z

but does it do the full multimodal in-out capability shown in the app :)

xyc · 2024-05-13T22:35:38.000000Z

will see :) heard video capability is rolling out later

xyc · 2024-05-13T22:36:51.000000Z

api access is text/vision for now https://x.com/mpopv/status/1790073021765505244

xyc · 2024-05-10T23:43:55.000000Z

If you has suggestions for making RecurseChat more useful for you especially for RAG, I'd love to hear about it!