Hacker News new | past | comments | ask | show | jobs | submit login
Using GPT3, Supabase and Pinecone to automate a personalized marketing campaign (vimota.me)
252 points by vimota on Feb 25, 2023 | hide | past | favorite | 62 comments

> My script read through each of the products we had responses for, called OpenAI's embedding api and loaded it into Pinecone - with a reference to the Supabase response entry.

OpenAI and the Pinecone database are not really needed for this task. A simple SBERT encoding of the product texts, followed by storing the vectors in a dense numpy array or faiss index would be more than sufficient. Especially if one is operating in batch mode, the locality and simplicity can’t be beat and you can easily scale to 100k-1M texts in your corpus on commodity hardware/VPS (though NVME disk will see a nice performance gain over regular SSD)

Yep that's true! I'd probably do something like that if I were starting again, but the ease of calling a few APIs is pretty nice. I feel like that alone will drive a lot of adoption of some of these platforms even if it can just be done locally.

I'd argue that using something like SBERT + Faiss is easier and would take less time (you do not have two account creations + one billing setup), plus a working example of SBERT + Faiss is probably total less than 10 lines of code.

What does it look like? I've never heard of either.

  # pip install faiss-cpu sentence-transformers
  from sentence_transformers import SentenceTransformer
  import faiss
  # replace with own texts - this is a bad example since it contains only single words
  with open("/usr/share/dict/words", mode="r") as infile:
      corpus = { num: s.strip() for num, s in enumerate(infile.readlines()) }
  # encode the corpus using a good sentence transformer model - will be slow if no GPU
  model = SentenceTransformer("all-mpnet-base-v2")
  corpus_vectors = model.encode(sentences=list(corpus.values()))
  # construct a faiss kNN index
  num_vectors, num_dimensions = corpus_vectors.shape
  index = faiss.index_factory(num_dimensions, "L2norm,Flat")
  # optional: save index for reuse
  faiss.write_index(index, "/tmp/corpus_index.bin")  
  # index = faiss.read_index("/tmp/corpus_index.bin")
  # encode target text and find 10 nearest neighbors in index
  target_vector = model.encode(sentences=["apples"])
  distances, nearest_indexes = index.search(target_vector, 10)
  print(list(zip([corpus[i] for i in nearest_indexes[0]], distances[0])))
  # [('apples', 4.382169e-13), ('fruits', 0.47413948), ('fruit', 0.57227534), ...

This is great, thanks!

Perhaps the Faiss dependency can be dropped? Supabase supports the pg_vector extension now.

Great result!

I come to HN for posts like this, thank you.

most HN comment ever

had the same reaction to the jargon...

...but I said WTH, googled SBERT and followed my nose and got it installed in minutes on my Mac and they kindly included a cut/paste example of semantic search.

The link is missing a slash after the .net


(Dude you went to the trouble of pointing it out but didn't post a fixed one to save folks typing, you rascal :P )

Agreed - will check it out as well.

OK - I did check it out. Pretty cool. Some learnings: https://twitter.com/deepwhitman/status/1630091600465641474

I appreciated (and understood) the comment.

If i have a lot of data, it's cheaper and more efficient to build your own batch inference application and just use two well-known libraries (s-bert and the FAISS indexing library). That didn't occur to me. I come here for insights - and i got one here.

Yes. Very Dropbox.

Not really. That famous Dropbox comment was questioning the value of the product while offering a more difficult way to get the same result.

Commenter here is not questioning the value of any product, but offering a potentially simpler way of achieving the same thing.

The DropBox comment was also “offering” a simpler way of storing data using rsync.

The issue is that these guys solved a problem and are seeing the monetary benefit, but the first comment is explaining in tech jargon how to solve this problem “better”. Can’t that user just accept the end result is the same, but OP got there without in-depth knowledge of technology that can solve this faster?

I personally think there's a difference between saying "I have qualms with this app" and questioning its usefulness as an app as well as its viability as a business (Dropbox comment) and someone offering an alternative/simpler way to achieve a cool result in a blog post (parent comment). The latter is useful for folks hoping to replicate the results (for instance, I now know I can use FAISS instead of having to go sign up for pinecone).

OP seems to feel the same, they replied to parent.

What is the famous Dropbox comment?

Thanks <3

Very HN thread, I dig it

> And it pretty much worked! Using prompts to find matches is not really ideal, but we want to use GPT's semantic understanding. That's where Embeddings come in.

sounds like you ended up not using GPT3 in the end which is probably wise.

i'm curious if you might see further savings using other cheaper embeddings that are available on huggingface. but its probably not material at this point.

did you also consider using pgvector instead of pinecone? https://news.ycombinator.com/item?id=34684593 any painpoints with pinecone you can recall?

I've seen really good results using BERT and other open-source models for matching symptoms / healthcare service names to CPT/HCPCS code descriptions. Even for specialized domains, some of the freely available models perform well for ANN search. While BERT may not be viewed as state-of-the-art, versions of it have relatively low memory requirements versus newer models which is nice if you're self-hosting and/or care about scalability.

I’m curious what problems you’ve applied this to. Would love to chat if you are open. (My email is in my profile)

I didn't use the GPT3 autocomplete API in the end (though I did play around with it) but did use the embeddings API (which I believe is still considered part of the "GPT3" model, but I could be wrong!).

I totally could! I think each use case should dictate which model you should use, in my case I was not super cost or latency sensitive since it was a small dataset and I cared more about accuracy. But I'm planning on using something like https://huggingface.co/sentence-transformers/all-MiniLM-L6-v... for my next project where latency and cost will matter more :)

I have a lot of thoughts around that last question! The Supabase article came out way after I implemented this (August of last year) so I didn't even think to do that, not sure if it was even supported back then, but I'd probably reach for that if I was re-doing the project to reduce the number of systems I needed. I think the power of having the vector search done in the same DB as the rest of the data is that sometimes you may want to have structured filtering before the semantic/vector ranking (ie. only select user N's items and rank by similarity to <query>) which is trickier to do in Pinecone. They support metadata filtering but it feels like an after thought. For the project I'm working on now (https://pinched.io) , we'd like to filter on certain parameters as well as rank by relevance, so I'm going to explore combining structured querying with semantic search (ie. pgvector or something similar on DuckDB if it adds support for this).

> https://pinched.io

requested invite! i have a moderately large twitter so could be a good test heheh. i use https://www.flock.network/ for this stuff normally but the UX isnt that great so hoping for better.

My understanding is pinecone is much faster, but for this small search space, I doubt pgvectors would be noticably worse.

I tested an early version of pgvector against faiss and found faiss had much better performance


Creating the index on pinecone takes about a minute. Creating a table on postgres should take a few milliseconds!

I don't like the unscientific ad for his gf company.

'which helped launch the movement of those opposed to endocrine disruptors, was retracted and its author found to have committed scientific misconduct'

mehh … you might be overthinking it

it’s his blog either way.

I have see a lot of people write about how important the interaction between vector DBs and chat-GPT3 (and GPT3) is. I am still not much wiser after this article. Is it that it makes it easier to go from:

user query -> GPT3 response -> Lookup in VectorDB -> send response based on closest embedding in VectorDB


Query (→ GPT3 completion) → vector db lookups → GPT3 synthesis

The optional step two is used when the lookups are more closely related to an answer's latent space than the original query text. This approach is called HyDE (first published here: https://arxiv.org/abs/2212.10496).

The synthesis is also optional. You can essentially summarize your lookups or refine them or do whatever you want at this stage.

If you skipped steps 2 and 4, it's just a semantic search engine. If you skip step 2, you're either doing it for latency/performance reasons, or because the user query's embeddings are more similar to the docs in the vector db.

All the embedding-enabled GPT-3 apps I've seen do the following: User query -> Retrieve closest embedding's plaintext counterpart -> feed plaintext as context to GPT-3 prompt.

Is this a form of prompt engineering then?

Your vector DB has well formed prompts - users write random stuff, map it to the closest well formed prompt?

It's more like "prompt augmentation" or "prompt orchestration". Classic example is doing Q&A over a corpus. You can't feed the entire corpus into a GPT3 prompt. So you embed snippets of the corpus on vector space, then when you get a query, you vectorize that and find the nearest neighbor snippets, then send the question and snippets into GPT3 to answer the question (with those snippets as context).

OP's example is a little different, because he's not even using Gpt3 completions, he's just using their embeddings API to vectorize product names, then when he gets a new product name, he maps it into the space to find the nearest product names.

Wouldn't this approach be quite brittle? For example, where would one define snippet boundaries - isn't it possible that extracting a snippet at arbitrary points may change the information within that snippet?

But then you have the issue of GPT3 token limits, so you're limited in how many of these relevant snippets you can embed into a prompt. Wondering if there's a better way to go about this (for your first example, rather than OPs use case).

It works surprisingly well and you can see examples if you look up the documentation of GPT-Index or Langchain (both are libraries designed to enabled these type of use-cases, among others). Also, you can get fancy, for instance, you can have GPT3 (or any LLM) create multiple "layers" of snippets (for instance, you can have snippets of the actual text, then summaries of a section, then summaries of a chapter, and embed all those and pull in the relevant pieces). Or, you can go back-and-forth with the prompt multiple times to give/get more information.

I'm sure the techniques will evolve over time, but for now, these sorts of patterns (pre-index, then augmenting the prompt at query-time) seem to work best for feeding information/context into the model that it doesn't know about. The other broad family of techniques is around trying to train the model with your custom information ("fine-tuning", etc), but I think most practitioners will agree that's currently less effective for these sorts of use-cases. (Disclaimer: I'm not an expert by any means, but I've played around with both techniques and try to keep up-to-date on what the experts are saying).

Excited to see what comes of it. Lots of people will have a private corpus, and the idea that we can semantically query it sounds so interesting.

Like asking 'what streaming services am I paying for and how much have I spent on them to date?', and some tool going over your bank statements to pick out spotify, netflix etc. I could see being useful.


IMO, prompt engineering is a good umbrella term for all of these kinds of augmentations!

In this context, it is for semantic matching similar to:

"My daily face cream is BrandX's low-sheen formulation" -> "BrandX Matte Face Moisturizer"

To be clear, I didn't use the autocomplete GPT-3 API just the embeddings one! Pinecone has some good docs on it: https://docs.pinecone.io/docs/openai. Happy to answer any questions :)

Are you saving the match pairs somewhere? I imagine 1) there are a finite number of them, 2) doing an exact lookup in a DB first will be faster and easier than calling GPT3 and Pinecone every time, and 3) eventually GPT3 APIs will get pricey enough to make you think twice unless you're running your own instance on a cluster.

This looks incredible and magical to me. How do you learn to create things like this as a mostly web programmer? Vectorization, etc I had no idea could integrate with gpt etc but honestly it looks kind of obvious/effortless to the author.

Appreciate the kind words, but I'm sure you could pick it up pretty quickly too :) The OpenAI docs are pretty helpful as a starting point: https://platform.openai.com/docs/guides/embeddings/use-cases

Pardon my ignorance here. I started to play around with text generation today and came around plenty of resource but hard to make any sense of it. I had this working https://github.com/oobabooga/text-generation-webui and instead of it being able to answer question, it revolves around the concept of generating text.

In your case and ChatGPT3, does is it provide output based on the data you feed it? If that is the case, is there anything related to training the model to use your data?

I am trying to gauge a sense of what is going on.

How do you know if the output that was sent to customers (who believe they are getting accurate results from a knowledgable human being, BTW) is correct?

How long did this take?

Did you consider something like openrefine or fuzzy matching / levenshtein distance?

Seems like a common data cleaning ask with a small amount of data.

I played around with that and pgtrgm (https://www.postgresql.org/docs/current/pgtrgm.html) a bit but unfortunately didn't have great results. I did do a bunch of manual data cleaning though, and also had some overriding logic if certain keywords match it would avoid semantic search and default to a result (for common ones).

i fail to see how this data cleaning could not be solved with proper tokenization and some distance measure. the amount of power used for those api calls is slighty obscene.

edit: don't want to rant. it's not a bad post and i'm sure there is many and far more wasteful examples than this.

I'm disappointed that the article doesn’t explain what they ended up doing.

Why not just use GPT-3 or even GPT-2 classifier API? No generative AI needed

Looks like the Classifications API is deprecated: https://platform.openai.com/docs/guides/classifications

Embeddings are superior (classifiers are deprecated).

> 100s of human hours saved

wait till its thousands, millions, billions . . .

What is pinecone and is there a link to a website?

Pinecone is a vector database: https://www.pinecone.io/

Awesome to see the integration between Klaviyo automation and GPT-3 AI and using it to streamline your girlfriends processes. Keep up the fantastic work!

This is pure spam.

Loved reading it. Will feature this in my newsletter on AI Tools and learning resources, AI Brews https://aibrews.com

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact