Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Khoj – Chat offline with your second brain using Llama 2 (github.com/khoj-ai)
565 points by 110 on July 30, 2023 | hide | past | favorite | 150 comments
Hi folks, we're Debanjum and Saba. We created Khoj as a hobby project 2+ years ago because: (1) Search on the desktop sucked; we just had keyword search on the desktop vs google for the internet; and (2) Natural language search models had become good and easy to run on consumer hardware by this point.

Once we made Khoj search incremental, I completely stopped using the default incremental search (C-s) in Emacs. Since then Khoj has grown to support more content types, deeper integrations and chat (using ChatGPT). With Llama 2 released last week, chat models are finally good and easy enough to use on consumer hardware for the chat with docs scenario.

Khoj is a desktop application to search and chat with your personal notes, documents and images. It is accessible from within Emacs, Obsidian or your Web browser. It works with org-mode, markdown, pdf, jpeg files and notion, github repositories. It is open-source and can work without internet access (e.g on a plane).

Our chat feature allows you to extract answers and create content from your existing knowledge base. Example: "What was that book Trillian mentioned at Zaphod's birthday last week". We personally use the chat feature regularly to find links, names and addresses (especially on mobile) and collate content across multiple, messy notes. It works online or offline: you can chat without internet using Llama 2 or with internet using GPT3.5+ depending on your requirements.

Our search feature lets you quickly find relevant notes, documents or images using natural language. It does not use the internet. Example: Search for "bought flowers at grocery store" will find notes about "roses at wholefoods".

Quickstart:

  pip install khoj-assistant && khoj
See https://docs.khoj.dev/#/setup for detailed instructions

We also have desktop apps (in beta) at https://github.com/khoj-ai/khoj/releases/tag/0.10.0 if you want to try them out.

Please do try out Khoj and let us know if it works for your use cases? Looking forward to the feedback!




Just a heads up, your landing page on your website doesn't seem to mention Llama/the offline usecase at all, only online via OpenAI.

----

What model size/particular fine-tuning are you using, and how have you observed it to perform for the usecase? I've only started playing with Llama 2 at 7B and 13B sizes, and I feel they're awfully RAM heavy for consumer machines, though I'm really excited by this possibility.

How is the search implemented? Is it just an embedding and vector DB, plus some additional metadata filtering (the date commands)?


Thanks for the pointer, yeah the website content has gone stale. I'll try update it by end of day

Khoj is using the Llama 7B, 4bit quantized, GGML by TheBloke.

It's actually the first offline chat model that gives coherent answers to user queries given notes as context.

And it's interestingly more conversational than GPT3.5+, which is much more formal


This is a super cool project. Congrats! If you’re looking at trying different models with one API check out an open-source project a few folks and I have been working on in July in case it’s helpful https://github.com/jmorganca/ollama

Llama 2 gives great answers, even the 7B model. There’s an “uncensored” 7B version as well George Sung has fine-tuned for topics that the default Llama2 model won’t discuss - eg I had trouble having Llama2 review authentication/security code or topics: https://huggingface.co/TheBloke/llama2_7b_chat_uncensored-GG...

From just playing around with it the uncensored model still seems to know where to “draw the line” on sensitive topics but YMMV

If you do end up checking out Ollama you can try it with with this command or there’s an API too (it’s not in the docs yet)

  ollama run llama2-uncensored


This (ollama) is neat! Thanks for the pointers.

Yeah, I ran into a couple of funny edge cases using Llama v2 with my personal notes. For example, if I ever asked it anything remotely personal (as I would with a personal assistant), it would often start telling me that asking for personal data is unethical. I get it, you have to be careful with the open source LLMs, but still a bit funny. It does work with enough coaxing though.


That's actually pretty embarrassing given the product's purpose. I believe there are uncensored LLAMA models out there that might be worth a shot


Oh interesting, so you're not using Llama 2, you're using the original. Have you begun to evaluate Llama 2 to determine the differences in performance?

How are you determining what notes (or snippets of notes?) to be injected as context? Especially given the small 2048 context limit with Llama 1.


Quick clarification, we are using LlamaV2 7B. We didn't experiment with Llama 1 because we weren't sure of the licensing limitations.

We determine note relevance by using cosine similarity between the query and the knowledge base (your note embeddings). We limit the context window for Llama2 to 3 notes (while OpenAI might comfortably take up to 9). The notes are ranked based on most to least similar and truncated based on the context window limit. For the model we're using, we're still limited to 2048 tokens for Llama v2.


Have you looked at using the long context (32K) version of the Llama v2 7B released by Together AI?

https://together.ai/blog/llama-2-7b-32k


Oh neat, thanks for sharing that! Having a 32K offline model is pretty promising. Let me test out how it performs


I thought llama V2 has a context window of 4096?


Why only the 7B version? Would there be possibility to add support for the 13B as well if someone has enough RAM to run it?


We're still trying to figure out the right balance between configurability and ease of use/maintenance.

The 7B version was a decent enough starting point in terms of what it can answer (and way fewer folks can run a 13B on their machine).

If you really want you can just replace the 7B model file with the 13B one under the ~/.cache/gpt4all directory on your device and it should just work.


Is a vector db used?


> Just a heads up, your landing page on your website doesn't seem to mention Llama/the offline usecase at all, only online via OpenAI.

I am sufficiently uneducated on the ins and outs of AI integrations to always wonder if projects like this one can be used in local-only mode, i.e. when self-hosted ensuring me that never any of my personal information is sent to a remote service. So it would be very helpful to very explicitly give me that assurance of privacy, if that's the case.


Yes, this seems to be a common concern. We're trying to see how best to crisply address it. But yes, Khoj can be used even with your internet turned off


Really cool to see this! Local is the real future of AI.

I got really excited about this and fired it up on my petite little M2 Macbook Air only for it to grind it to a halt. Think the old days when you had a virus on your PC and you'd move the mouse then wait 45 seconds to see the cursor move. It honestly made me feel nostalgic. I guess I have to taper performance expectations with this Air, though this is the first time it's happened.


Just wait 10 years when computers have a dedicated AI-PU and you don't have to worry about freezing anything up to talk to your bot.


I wonder if edge tpus (sold here https://coral.ai/products/) could help, though I guess the community would have to optimize for them instead of for standard hardware


The M1 line of Macs does have that unit.


How much memory do you have in your Macbook? The 7B models seem to work well with at least 16GB of unified memory, but I’ve seen Macs with 8GB really struggle.


Indeed it's just a poor little 8GB of RAM.


Could this do something like take in the contents of my web history for the day and summarize notes on what I've been researching?

This is getting very close to my ideal of a personal AI. It's only gonna be a few more years until I can have a digital brain filled with everything I know. I can't wait


That would be pretty awesome. Building a daily web history summarizer as a browser extension shouldn't be too much work. I bet there's something like that already out there.

Having something that indexes all your digital travels and makes it easily digestible will be gold. Hopefully Khoj can become that :)


> I bet there's something like that already out there.

There was.

It was called Google Desktop Search, it was awesome, and it was axed.

That said, today I wouldn't use it anyway as both I and Google have changed a lot.


Oh yeah, I used to use Google Desktop too. It was awesome for it's time



Have you used this? Looks fairly interesting.


I've been using Rewind for a while and find quite a bit of value for it. Because it uses GPT-4 for Ask Rewind it's not local-only for that feature, but I find myself just using the local search most of the time and it works fairly well (and have it firewalled off, so I know they're at least not lying about the data staying local except for updates, analytics [if you opt in], and Ask Rewind usage).


Interesting, this is the exact question that came to mind for me. This would address a pain point for me.

Does anyone have recommendations for a tool that does it?

Or, anyone want to build it together?


What's the posthog telemetry used for? Why is there nothing on it in the docs? Why no clear way to opt out?


Thanks for pointing that out!

We use it for understanding usage -- like determining whether people are using markdown or org or more.

Everything is collected entirely anonymized, and no identifiable information is ever sent to the telemetry server.

To opt-out, you set the `should-log-telemetry` value in `khoj.yml` to false. Updated the docs to include these instructions and what we collect -- https://docs.khoj.dev/#/telemetry.


It’s pretty easy to remove which is what I ended up doing. The project works remarkably well otherwise.


This seems like a cool project.

It would be awesome if it could also index a directory of PDFs, and if it could do OCR on those PDFs to support indexing scanned documents. Probably outside of the scope of the project for now, but just the other day I was just thinking how nice it would be to have a tool like this.


Yeah being able to search and chat with PDF files is quite useful.

Khoj can index directory of PDFs for search and chat. But it does not currently work with scanned PDF files (i.e not with ones without selectable text).

Being able to work with those would be awesome. We just need to get to it. Hopefully soon


Check pdftotext it's a CLI tool (maybe a library too) that makes pdf text selectable. Oh sorry, I meant to say ocrmypdf. But hey, maybe it's worth checking both.


Ive wanted a crawler on my machine for auto-categorizing and organizing, tagging and moving ALL my files around based on all my machines - so the ability to crawl PDFs, downloads, screenshots, pictures, etc and give me a logical tree of the org of the files - and allow me to modify it by saying "add all PDF related to [subject] here and the organize by source/author etc... and then move all my screenshots, ordered by date here

etc...

I've wanted a "COMPUTER.", uh... I say "COMPUTER!", 'sir, you have to use the keyboard', ah a Keyboard, how quaint.... forever.


That.would.be.awesome! Khoj isn't their yet, but that actually shouldn't be too far away if you give it a voice interface and terminal access.

Of course, having it be stable enough to not `rm -rf /` soon after is definitely not part of the warranty


Copy first, move and delete after confirmation from user on big moves.

-

So aside from doing something such as this - whats the size of your current usecase - is it being able to search and org a large orge of devs through files and directories/repos/whatever?

What if you had a global glob of data/code/files - and as an enterprise you can just give each new emp an employee role, and based on their job and purported skills, just throw them at the monolith, yet the AI will tailor the info they get out of the glob.

"User10267A - you are chartered with working on projects XYZ, based on your skillset - here are a few assessments to see which sections of the codebase best suit you."

[does work]

"OK .U267A, here is your workspace with all auths to the libs and such youll need to work on [PROJECT FSPEZ]"


I see you’re using gpt4all; do you have a supported way to change the model being used for local inference?

A number of apps that are designed for OpenAI’s completion/chat APIs can simply point to the endpoints served by llama-cpp-python [0], and function in (largely) the same way, while using the various models and quants supported by llama.cpp. That would allow folks to run larger models on the hardware of their choice (including Apple Silicon with Metal acceleration or NVIDIA GPUs) or using other proxies like openrouter.io. I enjoy openrouter.io myself because it supports Anthropic’s 100k models.

[0]: https://github.com/abetlen/llama-cpp-python


The point of gpt4all is that you can change the model with minimal breaking. You should be able to change this line https://github.com/khoj-ai/khoj/blob/master/src/khoj/process... to the model you want. You'll need to build your own local image with docker-compose but should be relatively straight forward.


Yeah, the gpt4all project is super neat. If folks are inclined enough, it should be fairly straightforward for you to clone the Khoj project and swap out the model used. You'd have to update the model type in a few places, but should be easy enough just with normal string/keyword search. Then run it directly from inside your machine. You will, however, have to go in and modify the prompt structure to fit the model's expectation. Some guidance on that in this PR with Falcon: https://github.com/khoj-ai/khoj/pull/330/files#diff-7fa11396...

I'll provide my insight from experimentation integrating Llama V2/GPT4All into Khoj -- Falcon 7b is probably the runner up in models that can be supported on consumer hardware, and it really wasn't good enough (for me) on my machine to be useful. The token consumption with personal notes context is too large, and the content too variable for a small model like that to be able to understand it. It's fine if you're just doing normal question-answering back and forth, but you don't need Khoj for that.


No, we don't yet. Lots of developer folks want to try different models, we want to provide simple to use, but deep assistance. Kind of unsure what to focus on given our limited resources.


I really like the idea of running a dedicated server that serves up various large language models via a standardized API, and then Khoj could just be pointed at one. Depending on the notes and the type of conversation I want to have, that would even allow for Khoj to swap models on the fly.


We're planning to work on this!


As someone who's been getting int o using Obsidian and messing around with chat ais, this is excellent, thank you!


Thanks! Do try it out and let us know if it works for your use-case?


Really encourages me to move to Obsidion :D


I have not tried it but something like this should exist. I don't think it is going to be as useable on consumer hardware as yet unless you have a good enough GPU but within couple of years (or less), we'll be there I am sure.

Irrelevant opinion - The logo is beautiful, I like it and so are the colours used.

Lastly, LLMA2 for such use cases, I think is capable enough that paying for ChatGPT won't be as lucrative especially when privacy is of concern.

Keep it up. Good craftsmanship. :)


Thanks! I do think Llama V2 is going to be a good enough replacement for ChatGPT (aka GPT3.5) for a lot of use cases.


I've been playing with Khoj for the past day - it's really neat, well done!

A few observations:

1. Telemetry is enabled by default, and may contain the API and chat queries. I've logged an issue for this along with some suggestions here: https://github.com/khoj-ai/khoj/issues/389

2. It would be advantageous to have configuration in the UI rather than baking it's YAML into the container image. (added a note on that in the aforementioned issue on Github).

3. It's not clear if you can bring your own models, e.g. can I configure a model from huggingface/gpt4all? if so, will it be automatically downloaded based on the name or should I put the .bin (and yaml?) in a volume somewhere?

4. AMD GPU/APU acceleration (CLBLAS) would be really nice, I've logged an issue for this feature request as well. https://github.com/khoj-ai/khoj/issues/390


Thanks for the feedback! Much appreciated.

I responded in the issue, but I'll paste here as well for those also curious:

Khoj does not collect any search or chat queries. As mentioned in the docs, you can see our telemetry server[1]. If you see anything amiss, point it out to me and I'll hotfix it right away. You can see all the telemetry metadata right here[2].

[1]: https://github.com/khoj-ai/khoj/tree/master/src/telemetry

[2]: https://github.com/khoj-ai/khoj/blob/master/src/khoj/routers...

Configuration with the `docker-compose` setup is a little bit particular, see the issue^ for details.

Thanks for the reference points for GPU integration! Just to clarify, we do use GPU optimization for indexing, but not for local chat with Llama. We're looking into getting that working.


Would it be possible to support a custom URL for the local model, such as running ./server in ggml would give you?

This may be more difficult if you are pre-tokenizing the search context.

Very cool project.


It's funny that you mention `C-s`, because `isearch-forward` is usually used for low-latency literal matches. In what workflow can Khoj offer acceptable latency or superior utility as a drop-in replacement for isearch? Is there an example of how you might use it to navigate a document?


That's (almost) exactly what khoj search provides a search-as-you-type experience but with a natural language (instead of keyword) search interface.

My workflow looks like: 1. Search with Khoj search[1]: `C-c s s` <search-query> RET 2. Use speed key to jump to relevant entry[2]: with `n n o 2`

[1]: `C-c s` is bound to `khoj` transient menu [2] https://orgmode.org/manual/Speed-Keys.html


Can you elaborate on that a little bit? When you search like that, what do you type in to find stuff in a programming language source file? I'd like to better understand this workflow, it seems interesting and I might be missing out.


Firstly, my previous statement needs quite a lot of clarification:

Khoj works more like an incremental, natural language version of org-agenda-search (or projectile) rather than isearch.

You've to configure which files it should index first. You can then use natural language to search those files with a search-as-you-type experience.

This is not the same as isearch that just searches the current file for keyword matches.

Second, khoj doesn't index source code and the default search models don't work well with code files.

But yeah someone should implement a natural language isearch as a standalone tool (as suggested by parent comment). It'd be super-useful


Awesome work, I've been looking for something like this. Any plans to support Logseq in the future?


Yes, we hope to get to it soon! This has been an ask on our Github since a while[1]

[1]: https://github.com/khoj-ai/khoj/issues/141


Heads up, docker build fails with:

#12 2.017 ERROR: Could not find a version that satisfies the requirement pyside6>=6.5.1 (from khoj-assistant) (from versions: none)

#12 2.017 ERROR: No matching distribution found for pyside6>=6.5.1

------

executor failed running [/bin/sh -c sed -i 's/dynamic = \["version"\]/version = "0.0.0"/' pyproject.toml && pip install --no-cache-dir .]: exit code: 1


Darn, I've seen this error a couple of times. Can you drop a couple of details in this Github issue? https://github.com/khoj-ai/khoj/issues/391

I'm particularly interested in your OS/build environment.


Sure thing! I left a comment.

The "buildx" flag gets me past that one, and to the next error:

#0 12.37 ERROR: Could not find a version that satisfies the requirement gpt4all>=1.0.7 (from khoj-assistant) (from versions: 0.1.5, 0.1.6, 0.1.7)

#0 12.37 ERROR: No matching distribution found for gpt4all>=1.0.7


Two comments

1. If you want better adoption especially among corporations, GPL-3 wont cut it. Maybe think of some business friendly licenses (MIT etc)

2. I understand the excitement about llm's. But how about making something more accessible to people with regular machines and not state of art. I use rip-grep-all (rga) along with fzf [1] that can search all files including pdfs in a specific folders. However, I would like a GUI tool to

   (a) search across multiple folders, 

   (b) provide priority of results across folders, filetypes and 

   (c) store search histories where I can do a meta-search. 
This is sufficient for 95% of my usecases to search locally and I don't need LLM. If khoj can enable such search as default without LLM that will be a gamechanger for many people without a heavy compute machine or who dont want to use OpenAI.

[1] https://github.com/phiresky/ripgrep-all/wiki/fzf-Integration


Just a note to suggest that giving away your hard work to those who will profit from it in the hope that they will remember you later seems like a pretty dubious exchange.

Have a look at how that worked out for the folks who built node and its libraries versus the ones who maintained control of their work (like npm).


What happened there. Surely the people who built node (or the people building the most popular fork at least) get to define what the default package manager is etc. and get some BATNA against the likes of a 3rd party package manager profiting from their thing. I don't know what the node/npm relationship story is though.


If corporations have no issue with using restrictive proprietary licenses, they should not have any issues with the GPL.


That seems like a pretty trivial thing to implement. Why not do it yourself?


Hey, I saw Khoj hit HN a few weeks ago and get slaughtered because the messaging didn't match the product.

You've come a good way in both directions: the messaging is clearer about current state vs aspirations, and you've made good progress towards the aspirational parts.

Really glad to see the warm reception you're getting now. Nice job, y'all.


Hey ubertaco! I remember you. Appreciate the well-wishes. The landing page still needs some tweaking. It's kind of hard keeping what you're building in sync with what you're aspiring for, but we're definitely working towards it.


What's the recommended 'size' of the machine to run this?

I tried to run it on a pretty beefy machine (8 core cpu/32 GB RAM) to use with ~40 odd PDF documents. My observation is that the queries (chat) takes forever and also getting Segmentation fault (core dumped) for every other or so query.


Thanks for the feedback. Does your machine have a GPU? 32GB CPU RAM should be enough but GPU speeds up response time.

We have fixes for the seg fault[1] and improvement to the query speed[2] that should be released by end of day today[3].

Update khoj to version 0.10.1 with pip install --upgrade khoj-assistant later today to see if that improves your experience.

The number of documents/pages/entries doesn't scale memory utilization as quickly and doesn't affect the search, chat response time as much

[1]: The seg fault would occur when folks sent multiple chat queries at the same time. A lock and some UX improvements fixed that

[2]: The query time improvements are done by increasing batch size, to trade-off increased memory utilization for more speed

[3]: The relevant pull request for reference: https://github.com/khoj-ai/khoj/pull/393


Have anyone got something valuable from talking to your second brain? What kind of conversations are you trying to have?


Traumatic Brain Injury. I can’t remember yesterday.

Would be hella nice to connect all the scattered lines of thoughts in various notes on a variety of platforms.


If you're on mac I would strongly recommend Notational Velocity (or the Alt version), if they still run (I know Apple likes to break compatibility).

I've tried dozens of notetaking apps and that's the only one that truly felt like a second brain.

It's because of the speed. Infuriatingly, Obsidian for example can search just as fast, but they intentionally programmed in a lag after each keystroke... (I know because I removed it.)


Dear Lord, why would they do such a thing. I think I've experienced this, and decided I hated Obsidian because it made my computer feel slow (it's not).


I don't know about Obsidian, but delay before search prevents a bunch of useless queries to the prefixes of your search terms. If search is slow, adding a delay to prevent extra searches might make searching feel faster.

Alternatively, you could get near-zero delay and no spurious queries by requiring the user type Enter or click a button... but that design is much less common these days.


Re: "useless searches": my point in the comment above was that this is what Notational Velocity does--instantly updates search list on every keystroke--and it's the reason I like it significantly more than anything else I've tried.


> they intentionally programmed in a lag after each keystroke

Yeah, it's seems they've added a debounce. I'd prefer to set it to 0ms as well. Do you remember how you removed it?


I followed this guide for modifying Electron apps.

https://github.com/jonmest/How-To-Tamper-With-Any-Electron-A...

Obsidian is not open source so it's minified and hard to read. But I was able to find the relevant code and just set the delay to 0.

(I'm away from computer now, I'll see if I can find the code later.)

What also helped is that all Electron apps are just Chromium so you can run the dev tools and the debugger! I think the hotkey is F12, and/or Ctrl+Shift+J.


it has been removed in Obsidian 1.4.2 https://obsidian.md/changelog/2023-07-31-desktop-v1.4.2/


I need this response too.


Thanks for your feedback, this has been removed in Obsidian 1.4.2 https://obsidian.md/changelog/2023-07-31-desktop-v1.4.2/


That's really cool, I appreciate that!

However, updating Obsidian actually made it slower for me, because 1.4.2 is only available to paid users. So it updated to 1.3.7 and removed my patch!


If you're on windows check out TimeSnapper. The classic version is free and works fine.

It screencaps your desktop every 5 sec so you can watch a timelapse of how you spent your day. (Assuming it was on the computer!)

I did find it heavy on the disk usage so I wrote a ffmpeg script to convert it to video (much more efficient).


Wow that's an intense use-case. I don't know how but we'd love to be able to support this.

If you can collate your notes into markdown or some such, then messy notes can be handled, at least using Khoj with GPT3.5+.

Do let us know how we can help out and what your current biggest pain-points are?


I am sorry.

Would some summary of previous day would be helpful to you? Is your memory problem only episodic, or does it extend to factual and kinesthetic as well?


I want a body cam that I wear and it transcribes into something searchable from things I did...

Basically like a gopro on steroids with searchable context - or even the ability for me to say outloud "KEEP A NOTE OF THIS" and it will keep a segment tagged and can give me summaries of moments I wanted particularly logged...

I applied to YC with an idea 'sorta' like this almost a decade ago.

The idea was to have a timeline of communications between all my contacts such that I could side-scroll a timeline with dots of actions such a "sent email" "made call" "sent text" received txt" and I could see all these in filters by contacts/day whatever...

This was pre-snowden, so I didnt have confirmation that there were already people doing this for me, just not letting me browse my own data ;-)


> Basically like a gopro on steroids with searchable context - or even the ability for me to say outloud "KEEP A NOTE OF THIS" and it will keep a segment tagged and can give me summaries of moments I wanted particularly logged...

This is generally called Lifelogging. https://roberdam.com/en/wisper.html - roberdam@ created basically what you just said, but focused on Audio, not Video.

https://news.ycombinator.com/item?id=29692087 has some possible info too.


Yeah, it bugs me that I don't know where I was a year ago but my phone company does.

Can I get that via GDPR? Has anyone tried?

For Android users a more straightforward option is location history, but you should probably turn that off.


Likewise! That was one of the impulses behind working on Khoj -- we have all this data about places we go to, things we do, websites we traffic, but such poor tooling into how to actually retrieve that information right now.

For example, if I stayed at an Airbnb last year in Houston and needed to lookup the address for some reason, I'd be going either to gmail and running some keywords searches ("Houston", "Airbnb"), or going to my Airbnb app.

Really, I want a single endpoint where all my personal data can be made available to me, ideally without sacrificing my privacy. Location's a cool use case.


Might look into some of the tools like novoids Memacs. Notion here is to build tools that push feeds, history data, into Emacs. Using org in your use case with the Khoj tool, could be the "glue" you need to tie it all together. https://github.com/novoid/Memacs#readme.


Why should you turn that off? If you're afraid of being tracked, it's too late, you're already being tracked by your carrier via the IMEI of your phone, without your consent. Location history is there for your convenience so you can relive where you were a year ago.


If you do not understand basic concepts like this, perhaps this isn't a forum where you should interact.


You might want to check the age of our respective accounts.


I quite like this concept. It would be neat if you could relay the data to a personal server for processing and insight extraction. Seems feasible with phone camera. I think gopros would be limited based on battery life (in my experience).


I wonder if this is a better use case for “smart” eyeglasses? Audio as the input at first, have the audio files sync wirelessly to your phone, and apply the ML transcription and prompt keys locally.


rewind.ai might be able to help


This is very cool, the Obsidian integration is a neat feature.

Please, someone make a home-assistant Alexa clone for this.


Thanks!

We've just been testing integrating over voice, whatsapp over the last few days[1][2] :)

[1]: https://github.com/khoj-ai/khoj/tree/khoj-chat-over-whatsapp...

[2]: https://github.com/khoj-ai/khoj/compare/master...features/wh...


I’m not a software dev.

Is there a way to have this bot read from a discord and google drive?


gpt4all itself (the library on the backend for this) has a similar program [1]. You just need to put everything into a folder. This should be straight forward for google drive. Harder for discord though but I’m sure theres a bot online that can do the extraction.

[1] https://gpt4all.io/index.html


Would anybody be able to recommend any standalone solution (essentially data must not leave elsewhere) to chat with documents with a web interface?

I tried privategpt but results were not great.


Khoj provides exactly that; it runs on your machine, none of your data leaves your machine and it has a web interface to chat


From previous answers it appears you're using standard lama-7b (quantized to 4 bits). I suppose you're doing a search on the notes than you pass what you found with the original query to lama. This technique is cool, but there are many limitations. For example lama's content length.

I can't wait for software that will take my notes each day and fine tune a LLM model on them so I can use entire context length for my question/answers.


> I can't wait for software that will take my notes each day and fine tune a LLM model on them so I can use entire context length for my question/answers

Problem is finetuning does not work that way. Finetuning is useful when you want to teach a model about a certain pattern, not when you want it output it right. Eg: With enough finetuning and prompts, a model will be able to output the result in a certain format that you need, but it does not guarantee that it would not be hallucination prone. The best way to minimize hallucination is still embedding based retrieval passed along with the question/prompt.

In future, there can be a system where you can build a knowledge base for LLMs, and tell it to access that for any knowledge, and finetune it for the patterns you want the output in.


Cool project. I tried it last time this got posted, but it was still a bit buggy. Giving it another shot - I'm mainly interested in the local chat.

Could you elaborate on the incremental search feature? How did you implement it? Don't you need to re-encode the full query through a SBERT or such as each token is written (perhaps with debouncing)?

Also, having an easily-extended data connector interface would be awesome, to connect to custom data sources.


Buggy for setup? We've done some improvements and have desktop apps (in beta) too now to simplify this. Feel free to report any issues on the khoj github. I can have a look.

Yes, we don't do optimizations on the query encoding yet. So SBERT just re-encodes the whole query every time. It gets results in <100ms which is good enough for incremental search.

I did create a plugin system, so that a data plugin just has to convert the source data into a standardized intermeditate jsonl format. But this hasn't been documented or extensively tested yet.


Interesting. The obvious question you haven't answered anywhere (as far as I can see) is what are the hardware requirements to run this locally?


Ah, you're right, forgot to mention that. We use the Llama 2 7B 4 bit quantized model. The machine requirements are:

Ideal: 16Gb (GPU) RAM

Less Ideal: 8GB RAM and CPU


So just to clarify, is that: Ideal is running the model on a GPU (any brand? Nvidia, AMD, etc.?) with 16GB of GPU RAM, less ideal is running it on the CPU, for which it needs 8GB system RAM? Presumably it will occupy all that memory while it's running?

What about if I have a GPU with 8GB?


Khoj uses Llama 2 7B, 4bit quantized. So it just needs 3.5Gb of RAM (GPU or System) [1].

Khoj and your other apps need more RAM themselves, so practically 8GB of System or GPU RAM should suffice.

Khoj has been tested with CUDA and Metal capable GPUs. So Nvidia and Mac M1+ GPUs should work. I'm think it'll work with AMD GPUs out of the box too but let me know if it doesn't for you? I can look into what needs to be done to get that to work.

[1]: The calculation is [params] * [bytes] GB RAM, so 7 * 0.5 = 3.5Gb


Sorry for the repetition, but do you mean 16 GB VRAM? That is a very high requirement, a RTX 4060 only has 8GB and even a RTX 4070 only ships with 12GB. Any upcoming further optimizations for reducing memory usage?

PS. Nice to see an Hindi name for a software. For those who don't speak Hindi: https://en.m.wiktionary.org/wiki/%E0%A4%96%E0%A5%8B%E0%A4%9C...


One of the reasons 4090 / 3090s are expensive these days. It’s an issue with the models, not with Khoj.


I’m in search of a new Macbook Mx. what is the requirements for running these model locally without breaking the bank? Would 32GB be enough?


You do not need to break the bank to use Khoj for local chat, 16Gb RAM should be good enough


How slow would that be on an old non-Apple laptop, but also 16Gb RAM?

lscpu output: Architecture: x86_64

  CPU op-mode(s):        32-bit, 64-bit

  Address sizes:         36 bits physical, 48 bits virtual

  Byte Order:            Little Endian
CPU(s): 8

  On-line CPU(s) list:   0-7
Vendor ID: GenuineIntel

  Model name:            Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz

    CPU family:          6

    Model:               58

    Thread(s) per core:  2

    Core(s) per socket:  4

    Socket(s):           1

    Stepping:            9

    CPU(s) scaling MHz:  35%

    CPU max MHz:         3400.0000

    CPU min MHz:         1200.0000

    BogoMIPS:            4791.90

    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cp

                         uid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr
                         _shadow vnmi flexpriority ept vpid fsgsbase smep erms 
xsaveopt dtherm ida arat pln pts md_clear flush_l1d

  L1d:                   128 KiB (4 instances)

  L1i:                   128 KiB (4 instances)

  L2:                    1 MiB (4 instances)

  L3:                    6 MiB (1 instance)
NUMA:

  NUMA node(s):          1

  NUMA node0 CPU(s):     0-7


It should work but not sure how fast. Best way to find out would be to try it.

We also have some fixes and perf improvements we plan to release later today in version 0.10.1


This would be even great if available as a Spotlight Search replacement (with some additional features that Spotlight supports).


Should be easy to plug it in with a Raycast.app or Alfred.app plugin.


Khoj exposes a local API. Hopefully that makes it easy to integrate with Raycast, Alfred (or even Spotlight)?


Yeah, this would be ideal for Mac users. Just need to look into what is required and how much work it is


Somewhat unrelated but do people have links to share that walk you through taking Llama2 model and feeding it local data - confluence links, Google docs, plain text documents etc. I came across embeddings and langchain but was curious if people had thoughts on better ways to go about it as a newcomer experiment.


Any chance it can look at Gitlab as well? I like the idea but I'm not giving all my work to Microsoft.


I tried the search using Slavic language (all my notes are in Slovene) - it performed very poorly: if the searched keyword was not directly in the note itself, the search results seemed to be more or less random.


Search should work with Slavic languages including Russian and 50+ other languages.

You'll just need to configure the asymmetric search model khoj uses to paraphrase-multilingual-MiniLM-L12-v2 in your ~/.khoj/khoj.yml config file

See http://docs.khoj.dev/#/advanced?id=search-across-different-l...


Khoj chat with Llama2 will not work with non-english languages though. You'll have to enable OpenAI for that


Yes, I confirm. I have many articles in Russian language and the search can not find relative information, but if I try to search in English it works fine and can find documents that use English


So you're saying you got no results when searching for patterns which did not exist in the dataset...?


There was always full list of results. They just weren't relevant.


Feedback for landing page: use a fixed height container for the example prompts. Without it, it causes jumping while scrolling down the page making other sections hard to read. iOS Safari


Thanks for the feedback! Someone else mentioned this issue the other day as well. I'll fix this issue on the landing page soon


Something I've noticed playing around with Llama 7b/13b on my Macbook is that it clearly points out just how little RAM 16GB really is these days. I've had a lot of trouble running both inference and a web UI together locally when browser tabs take up 5GB alone. Hopefully we will see a resurgence of lightweight native UIs for these things that don't hog resources from the model.


FWIW I've also had browser RAM consumption issues in life, but it's been mitigated by extensions like OneTab: https://chrome.google.com/webstore/detail/onetab/chphlpgkkbo...

For now, local LLMs take up an egregious about of RAM, totally agreed. But we trust the ecosystem is going to keep improving and growing and we'll be able to make improvements over time. They'll probably become efficient enough where we can run them on phones, which will unlock some cool scope for Khoj to integrate with on device, offline assistance.


The new Chrome "memory saver" feature that discards the contents of old tabs saves a lot of memory for me. Tabs get reloaded from the server if you revisit them.


Or hopefully we will see an end of the LLM hype.

Or at least models that don’t hog so much RAM.


>Or at least models that don’t hog so much RAM

The RAM usage is kind of the point though; we're trading space for time. It's not a problem that the model is using it, it's just that with the default choice for UI being web based now, the unnecessary memory usage of browsers is actually starting to be a real pain point.


1. I hear you on going back to lightweight native apps. Unfortunately the Python ecosystem is not great for this. We use pyinstaller to create the native desktop app but it's a pain to manage.

2. The web UI isn't required if you use Obsidian or Emacs. That's just a convenient, generic interface that everyone can use.


It would be pretty awesome if this could be hooked up into Jira and Confluence as well!


Hi, my dream app ! Will it work on non english sources ?


To use Chat with non-english sources you'll need to enable OpenAI. Offline chat with Llama 2 can't do that yet.

And Search can be configured to work with 50+ languages.

You'll just need to configure the asymmetric search model khoj uses to paraphrase-multilingual-MiniLM-L12-v2 in your ~/.khoj/khoj.yml config file

For setup details see http://docs.khoj.dev/#/advanced?id=search-across-different-l...


Thank you for your reply. I was thinking of using a translation model to translate all of my documents to english before indexing them.


How does one access this from a web browser?


We have a cloud product you can sign up for, but it's more limited in what data sources it supports. It currently only works for Notion and Github indexing. If you're interested in that, send me a dm on Discord - https://discord.gg/BDgyabRM6e

But that would allow you to access Khoj from the web.


Congrats guys!


Thanks! :)


will it work on linux?(ubuntu)


Yes, of course


Markdown doesn't work on HN...


[flagged]


Please don't post low effort, shallow dismissals; without substantiation you're not posting anything useful, you're just a loud asshole.


[flagged]


2.5 years! We're kind of slow :P


hi, you seem keen to share something neat you took less than 10 minutes to implement, I'd love to see that?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: