Custom Embeddings: Why going viral caused us to rip out everything in a weekend

tikkun · on Feb 23, 2023

Not sure what the product is

But the advice to essentially fine tune your embeddings with a custom matrix is good

There are also other embeddings platforms (other than OpenAI’s) that have built in fine tuning functionality

diceduckmonk · on Feb 23, 2023

There is a description at the bottom

> Buildt is an AI tool to help developers quickly search and understand large codebases.

shiv86 · on Feb 23, 2023

Could you please provide alternative embedding platforms

metalembeddings · on Feb 23, 2023

At Metal http://getmetal.io/ we're currently building a fine tuning platform. We host, index and version embeddings. Provide an easy way to manage the fine tuning jobs as well.

AaronM · on Feb 23, 2023

FYI, your first paragraph is duplicated under the image.

iandanforth · on Feb 23, 2023

I feel like there's also a paragraph missing explaining the product and the problem and how they were using LLMs initially.

martinclayton · on Feb 23, 2023

I've no idea what the 104 million / 143 million numbers in the spreadsheet-grid thing mean - anyone?

Twirrim · on Feb 23, 2023

Tokens, I think? There's an implication that there's a per-token cost, but no idea how they were architected before to even remotely understand how it fits. If they were running their own model on GPUs I guess I'd expect to see things in terms of token/sec, so I assume instead they were using some cloud learning model where they have to pay per token?

The blog post really suffers from "Written by someone who knows what they're talking about", could have done with a review before publication by someone who doesn't know the space. I go through that exercise any time I'm writing stuff for consumption outside of my service teams.

gwern · on Feb 23, 2023

They say it cost 'thousands of dollars', so I infer that the 'millions' must be either raw characters or BPE tokens sent to the API.

mkoryak · on Feb 23, 2023

Didn't you read the first paragraph... twice? Oh right, its not in there.

clark-kent · on Feb 23, 2023

I believe it refers to number of tokens. So thats's $0.02 per 1000 tokens for OpenAI Davinci model.

tylerneylon · on Feb 23, 2023

Would you mind providing a pointer to the the part of the OpenAI cookbook which explains how to build the matrix you multiply the embeddings by?

Sorry if it was there and I missed it. Thanks!

cheeseblubber · on Feb 23, 2023

I think https://github.com/openai/openai-cookbook/blob/main/examples... is the one they are referring to. This is the page that links it https://github.com/openai/openai-cookbook/blob/main/text_com...

tylerneylon · on Feb 23, 2023

Thanks!

jxodwyer1 · on Feb 23, 2023

Great post -- one of our core offerings at https://getmetal.io is to manage the process of utilizing Custom Embeddings very similar to your approach here! We'd love to connect and talk through some of the pain points that you came across -- i'll send you a message in the w23 slack!

mdagostino · on Feb 24, 2023

This doesn’t make any sense to me: how does fine-tuning the embeddings save money? It seemed like the problem was having to make too many API calls to generate the embeddings in the first place.

tomatbebo · on Feb 24, 2023

Embeddings are often used as features for these LLMs so before they were paying to generate embeddings and doing inference with these large models. Now they pay to generate embeddings, fine-tune them and do semantic search (probably approximate k-nearest neighbors). The hardware requirements for most LLMs make them much more expensive than approximate KNN with a vector database.

DanielVZ · on Feb 24, 2023

They went from indexing with embeddings + LLM to just using a biased embedding for their use case. This should save them most of their costs.

telarson · on Feb 25, 2023

Maybe this helps people understand what they are doing at index time.

* Version 1. Ask the LLM to describe the code snippet. Create an embedding of the description. LLM generation + embeddings required.

* Version 2. run the code snippet directly through the embedding API. Skip the LLM text generations step. Now run the code snippet through the bias matrix and finally index the resulting embedding.

I assume this only works b/c they fine tuned a bias matrix on code snippet and text pairs. Feels more like a light version of transfer learning to me.

The article was a little unclear in the actual approach for V2 so if I have anything wrong please correct me.

mdagostino · on Feb 24, 2023

I wouldn’t say most—maybe a factor of 2. Getting the embedding is still an API call to an LLM.

DanielVZ · on Feb 27, 2023

I’m pretty sure they were using a high cost LLM to summarize, and for embeddings you only need Ada, which is orders pf magnitude cheaper.

supermatt · on Feb 23, 2023

> We considered trying to use a self-hosted LLM as an alternative, but the costs would also have been extremely high for the amount of traffic we were processing.

Is it realistic to self-host an LLM that outperforms OpenAIs offerings cost wise? When I looked at the alternatives (self-hosted, alternate hosted LLM providers, or cloud compute options) you generally ended up with a subjectively worse model AND a lower inference speed - which resulted in me canning my idea as it was simply too expensive.

billythemaniam · on Feb 23, 2023

Flan-T5 is much smaller than GPT-3, but was trained on significantly more data resulting in competitive accuracy. It is also Apache licensed. I wonder if that model is fast enough for enough use cases to make it cost effective?

danielbln · on Feb 23, 2023

You can give the Xl (3B parameters) model a try here (would recommend a Colab Pro account): https://colab.research.google.com/drive/1Hl0xxODGWNJgcbvSDsD...

In my Colab Pro it's running this on a A100 (which is a very beefy GPU) and inference is very fast and definitely suitable for interactive use. On a T5 GPU (which is much cheaper) inference is still alright and probably ok for interactive use.

ntonozzi · on Feb 23, 2023

I think Flan-T5 is fast enough, but I don't think it generates text or abstract reasoning at nearly the same level as current GPT-3 models. This indicates a deficiency in the benchmarks and metrics that we use to evaluate LLMs. For generating embeddings it might work well enough though.

billythemaniam · on Feb 23, 2023

It's certainly not quite as good out of the box, at least the open sourced checkpoints. However so far I found it can achieve similar accuracy with enough examples and/or fine-tuning for my use cases. Like everything, it depends on what are doing too.

throwaway1851 · on Feb 24, 2023

For embeddings, it may be overkill. Smaller BERT-type models can provide good embeddings when fine tuned with a contrastive learning objective. Eg: https://sbert.net.

williamcotton · on Feb 23, 2023

Fine-tuning on smaller models like GPT-J (also trained on The Pile) worked well for Toolformer:

https://arxiv.org/abs/2302.04761

urbandw311er · on Feb 23, 2023

I think you should have a put a bit more thought into planning for scale. There’s a difference between not overengineering and not actually doing the basic maths to figure out the unit economics of your business model ahead of setting it live to the whole world.

andrewstuart · on Feb 23, 2023

What is a VI, an LLM and a bias matrix?

ec109685 · on Feb 23, 2023

VI: it's actually V1, but the blog uses a horrible font where you can't differentiate a 1 from an l (seriously, who thinks that's a good idea?).

LLM: large language model

Bias Matrix: part of fine tuning a model: https://platform.openai.com/docs/guides/fine-tuning