Go library for in-process vector search and embeddings with llama.cpp

kelindar · 2024-10-28T06:01:42 1730095302

This library was created to provide an easy and efficient solution for embeddings and vector search, making it perfect for small to medium-scale projects that still need some vector search. It's built around a simple idea: if your dataset is small enough, you can achieve accurate results with brute-force techniques, and with some optimizations like SIMD, you can keep things fast and lean.

biomcgary · 2024-10-29T22:43:21 1730241801

I love that you chose to wrap the C++ with purego instead of requiring CGO! I wrapped Microsoft's Lightgbm library and found purego delightful. (To make deployment easier, I embed the compiled library into the Go binary and extract it to a temp directory at runtime. YMMV.)

cyberax · 2024-10-30T00:52:12 1730249532

This post led me to purego, and I've just finished moving my toy project that uses PKCS#11 libraries from cgo to it. It's so much better now! No need to jump through hoops for cross-compilation.

mappu · 2024-10-30T01:51:54 1730253114

IME Linux and macOS users usually have a compiler available so CGO is mostly only a hassle for Windows, but on Windows this capability is built into the Go stdlib, e.g. `syscall.NewLazyDLL("msvcrt.dll").MustFindProc(...)`

biomcgary · 2024-10-30T16:37:10 1730306230

Thank you for pointing out this option. Any idea why the Go stdlib doesn't offer this for Linux and macOS? I'd rather not add compiling other languages to my Go workflow.

ashvardanian · 2024-10-30T13:43:09 1730295789

How is the latency of calling purego bindings vs cgo? The latter seems prohibitively expensive for most of my projects.

biomcgary · 2024-10-30T16:33:13 1730305993

IIRC, purego repurposes a lot of cgo machinery, so I don't think there would be much difference. For my purposes, it doesn't matter since the ML library does several seconds to minutes of work using multiple cores per call.

cyberax · 2024-10-30T15:42:57 1730302977

I haven't checked (I make maybe 10 calls per second at most). Intuitively, they should be similar.

jerrygenser · 2024-10-30T01:52:01 1730253121

Have you considered using HNSW instead of brute force?

huac · 2024-10-30T02:20:46 1730254846

nice work! I wrote a similar library (https://github.com/stillmatic/gollum/blob/main/packages/vect...) and similarly found that exact search (w/the same simple heap + SIMD optimizations) is quite fast. with 100k objects, retrieval queries complete in <200ms on an M1 Mac. no need for a fancy vector DB :)

that library used `viterin/vek` for SIMD math: https://github.com/viterin/vek/

neonsunset · 2024-10-30T07:49:41 1730274581

Look what Go needs to mimic even a fraction of .NET’s SIMD power… ;)

PhilippGille · 2024-10-30T07:13:20 1730272400

Interesting choice to call llama.cpp directly, instead of relying on a server like Ollama. Nice!

I wrote a similar library which calls Ollama (or OpenAI, Vertex AI, Cohere, ...), with one benefit being zero library dependencies: https://github.com/philippgille/chromem-go

milansuk · 2024-10-30T11:58:30 1730289510

No need to use Ollama. LLama.cpp has its own OpenAI-compatible server[0] and it works great.

[0] https://github.com/ggerganov/llama.cpp#web-server

citizenpaul · 2024-10-31T18:07:18 1730398038

Thanks didn't know that.

Do you happen to know the reason to use ollama rather than the built in server? How much work is required to get similar functionality? looks like just downloading the models? I find it odd that ollama took off so quickly if LLamma.cpp had the same built in functionality.

PhilippGille · 2024-10-30T19:58:18 1730318298

Yes I'm aware. I was contrasting the general use of an inference server vs calling llama.cpp directly (not via HTTP request).

And among servers Ollama seems to be more popular, so it's worth mentioning when talking about support for local LLMs.

kohlerm · 2024-10-30T13:18:00 1730294280

Nice! would have needed something like this last year.

ashvardanian · 2024-10-30T13:41:29 1730295689

USearch has had GoLang bindings for a long time, but it's more low-level and you'd have to use something else for embeddings: https://github.com/unum-cloud/usearch/tree/main/golang

ausbah · 2024-10-30T02:25:17 1730255117

could anyone recommend a similar library for python?

simonw · 2024-10-30T02:59:45 1730257185

I've used the Sentence Transformers Python library successfully for this: https://www.sbert.net/

My own LLM CLI tool and Python library includes plugin-based support for embeddings (or you can use API-based embeddings like those from Jina or OpenAI) - here's my list of plugins that enable new embeddings models: https://llm.datasette.io/en/stable/plugins/directory.html#em...

More about that in my embeddings talk from last year: https://simonwillison.net/2023/Oct/23/embeddings/

jncraton · 2024-10-30T13:39:05 1730295545

The languagemodels[1] package that I maintain might meet your needs.

My primary use case is education, as myself and others use this for short student projects[2] related to LLMs, but there's nothing preventing this package from being used in other ways. It includes a basic in-process vector store[3].

[1] https://github.com/jncraton/languagemodels

[2] https://www.merlot.org/merlot/viewMaterial.htm?id=773418755

[3] https://github.com/jncraton/languagemodels?tab=readme-ov-fil...

fjuafhwasd · 2024-10-30T02:10:35 1730254235

Do these queries complete within 10ms?

38 · 2024-10-30T00:33:27 1730248407

> git submodule update --init --recursive

nope. this looks cool, but Git submodules are cursed

IncreasePosts · 2024-10-30T01:34:46 1730252086

I think you mean recursed

snovv_crash · 2024-10-30T07:59:46 1730275186

What's a better option for linking 3rd party code?

38 · 2024-10-30T20:18:21 1730319501

Is this a joke? Go has built in support for importing 3rd party code

mappu · 2024-10-31T05:27:24 1730352444

Go has built-in support for importing Go modules, but the submodule is for a C++ library not a Go module, so your suggestion isn't workable.

o11c · 2024-10-30T02:40:42 1730256042

Dip it in a blessed clear potion.

pipe01 · 2024-10-30T11:49:54 1730288994

mananaysiempre · 2024-10-30T12:32:52 1730291572

Poor integration, mostly.

It’s fairly easy to get into an irrecoverably broken state using an intermediate-level Git operation such as an interactive rebase (as of a couple of years ago). (It’s probably recoverable by reaching into the guts of the repo, but given you can’t do the rebase either way I’m still taking off a point.) The distinguished remote URLs thing is pointlessly awkward—I’ve never gotten pushing to places where those remotes are inaccessible to work properly when the pushed commit updates the submodule reference. (I believe it’s possible, but given the amount of effort I’ve put into unsuccessfully figuring that out, I’m comfortable taking off a point here as well.)

I like git submodules, I think they’re fundamentally the right way to do things. But despite their age they aren’t in what I’d call a properly working state, even compared to Git’s usual amount of sharp edges.