Hacker News new | past | comments | ask | show | jobs | submit login
Custom Embeddings: Why going viral caused us to rip out everything in a weekend (buildt.ai)
96 points by Buoy on Feb 23, 2023 | hide | past | favorite | 31 comments



Not sure what the product is

But the advice to essentially fine tune your embeddings with a custom matrix is good

There are also other embeddings platforms (other than OpenAI’s) that have built in fine tuning functionality


There is a description at the bottom

> Buildt is an AI tool to help developers quickly search and understand large codebases.


Could you please provide alternative embedding platforms


At Metal http://getmetal.io/ we're currently building a fine tuning platform. We host, index and version embeddings. Provide an easy way to manage the fine tuning jobs as well.


FYI, your first paragraph is duplicated under the image.


I feel like there's also a paragraph missing explaining the product and the problem and how they were using LLMs initially.


I've no idea what the 104 million / 143 million numbers in the spreadsheet-grid thing mean - anyone?


Tokens, I think? There's an implication that there's a per-token cost, but no idea how they were architected before to even remotely understand how it fits. If they were running their own model on GPUs I guess I'd expect to see things in terms of token/sec, so I assume instead they were using some cloud learning model where they have to pay per token?

The blog post really suffers from "Written by someone who knows what they're talking about", could have done with a review before publication by someone who doesn't know the space. I go through that exercise any time I'm writing stuff for consumption outside of my service teams.


They say it cost 'thousands of dollars', so I infer that the 'millions' must be either raw characters or BPE tokens sent to the API.


Didn't you read the first paragraph... twice? Oh right, its not in there.


I believe it refers to number of tokens. So thats's $0.02 per 1000 tokens for OpenAI Davinci model.


Would you mind providing a pointer to the the part of the OpenAI cookbook which explains how to build the matrix you multiply the embeddings by?

Sorry if it was there and I missed it. Thanks!



Thanks!


Great post -- one of our core offerings at https://getmetal.io is to manage the process of utilizing Custom Embeddings very similar to your approach here! We'd love to connect and talk through some of the pain points that you came across -- i'll send you a message in the w23 slack!


This doesn’t make any sense to me: how does fine-tuning the embeddings save money? It seemed like the problem was having to make too many API calls to generate the embeddings in the first place.


Embeddings are often used as features for these LLMs so before they were paying to generate embeddings and doing inference with these large models. Now they pay to generate embeddings, fine-tune them and do semantic search (probably approximate k-nearest neighbors). The hardware requirements for most LLMs make them much more expensive than approximate KNN with a vector database.


They went from indexing with embeddings + LLM to just using a biased embedding for their use case. This should save them most of their costs.


Maybe this helps people understand what they are doing at index time.

* Version 1. Ask the LLM to describe the code snippet. Create an embedding of the description. LLM generation + embeddings required.

* Version 2. run the code snippet directly through the embedding API. Skip the LLM text generations step. Now run the code snippet through the bias matrix and finally index the resulting embedding.

I assume this only works b/c they fine tuned a bias matrix on code snippet and text pairs. Feels more like a light version of transfer learning to me.

The article was a little unclear in the actual approach for V2 so if I have anything wrong please correct me.


I wouldn’t say most—maybe a factor of 2. Getting the embedding is still an API call to an LLM.


I’m pretty sure they were using a high cost LLM to summarize, and for embeddings you only need Ada, which is orders pf magnitude cheaper.


> We considered trying to use a self-hosted LLM as an alternative, but the costs would also have been extremely high for the amount of traffic we were processing.

Is it realistic to self-host an LLM that outperforms OpenAIs offerings cost wise? When I looked at the alternatives (self-hosted, alternate hosted LLM providers, or cloud compute options) you generally ended up with a subjectively worse model AND a lower inference speed - which resulted in me canning my idea as it was simply too expensive.


Flan-T5 is much smaller than GPT-3, but was trained on significantly more data resulting in competitive accuracy. It is also Apache licensed. I wonder if that model is fast enough for enough use cases to make it cost effective?


You can give the Xl (3B parameters) model a try here (would recommend a Colab Pro account): https://colab.research.google.com/drive/1Hl0xxODGWNJgcbvSDsD...

In my Colab Pro it's running this on a A100 (which is a very beefy GPU) and inference is very fast and definitely suitable for interactive use. On a T5 GPU (which is much cheaper) inference is still alright and probably ok for interactive use.


I think Flan-T5 is fast enough, but I don't think it generates text or abstract reasoning at nearly the same level as current GPT-3 models. This indicates a deficiency in the benchmarks and metrics that we use to evaluate LLMs. For generating embeddings it might work well enough though.


It's certainly not quite as good out of the box, at least the open sourced checkpoints. However so far I found it can achieve similar accuracy with enough examples and/or fine-tuning for my use cases. Like everything, it depends on what are doing too.


For embeddings, it may be overkill. Smaller BERT-type models can provide good embeddings when fine tuned with a contrastive learning objective. Eg: https://sbert.net.


Fine-tuning on smaller models like GPT-J (also trained on The Pile) worked well for Toolformer:

https://arxiv.org/abs/2302.04761


I think you should have a put a bit more thought into planning for scale. There’s a difference between not overengineering and not actually doing the basic maths to figure out the unit economics of your business model ahead of setting it live to the whole world.


What is a VI, an LLM and a bias matrix?


VI: it's actually V1, but the blog uses a horrible font where you can't differentiate a 1 from an l (seriously, who thinks that's a good idea?).

LLM: large language model

Bias Matrix: part of fine tuning a model: https://platform.openai.com/docs/guides/fine-tuning




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: