yichuan's comments

yichuan · 2025-08-15T21:42:26 1755294146

@Berkeley SkyLab, we’re the first to bring semantic search to Claude Code with a fully local index in a novel, lightweight structure — check it out at LEANN(https://github.com/yichuan-w/LEANN). Unlike Claude-context, which uploads all data to the cloud, or Serena, which is heavy and limited to keyword search, our solution installs in just 1 minute and instantly enhances Claude Code’s capabilities.Unlike Claude-context, which uploads everything to the cloud, and Serena, which is heavy and limited to keyword search, our method sets up in just 1 minute and instantly boosts Claude Code’s quality.

yichuan · 2025-08-10T04:53:01 1754801581

I think there’s huge potential for a fully local “Cursor-like” stack — no cloud, no API keys, just everything running on your machine.

The setup could be: • Cursor CLI for agentic/dev stuff (example:https://x.com/cursor_ai/status/1953559384531050724) • A local memory layer compatible with the CLI — something like LEANN (97% smaller index, zero cloud cost, full privacy, https://github.com/yichuan-w/LEANN) or Milvus (though Milvus often ends up cloud/token-based) • Your inference engine, e.g. Ollama, which is great for running OSS GPT models locally

With this, you’d have an offline, private, and blazing-fast personal dev+AI environment. LEANN in particular is built exactly for this kind of setup — tiny footprint, semantic search over your entire local world, and Claude Code/ Cursor –compatible out of the box, the ollama for generation. I guess this solution is not only free but also does not need any API.

But I do agree that this need some effort to set up, but maybe someone can make these easy and fully open-source

airtonix · 2025-08-10T05:06:59 1754802419

it might be free, private, blazing fast (if you choose a model with appropriate parameters to match your GPU).

but you'll quickly notice that it's not even close to matching the quality of output, thought and reflecting that you'd get from running the same model but significantly high parameter count on a GPU capable of providing over 128gb of actual vram.

There isn't anything available locally that will let me load a 128gb model and provide anything above 150tps

The only thing that local ai model makes sense for right now seems to be Home Assistant in order to replace your google home/alexis.

happy to be proven wrong, but the effort to reward just isn't there for local ai.

PeterStuer · 2025-08-10T11:53:58 1754826838

Because most of the people squeezing that highly quantized small model into their consumer gpu don't get how they have left no room for the activation weights, and are stuck with a measly small context.

andylizf · 2025-08-10T08:02:14 1754812934

[flagged]

oblio · 2025-08-10T11:57:01 1754827021

You should probably disclose everywhere you comment that you're advertising for Leann.

yichuan · 2025-08-09T09:29:33 1754731773

That's my vision, hope it can help. I think that if we combine all our personal data and organize it effectively, we can be 10 times more efficient. Long-term AI memory, all you speak and see will secretly be loaded to your own personal AI, and that can solve many difficulties, I think. https://x.com/YichuanM/status/1953886817906045211

yichuan · 2025-08-08T21:37:26 1754689046

I guess for semantic search(rather than keyword search), the index is larger than the text because we need to embed them into a huge semantic space, which make sense to me