Hi Simon! Big fan of your blog. I actually generate two summaries: one is part o...

Hi Simon! Big fan of your blog.

I actually generate two summaries: one is part of the ingestion pipeline and used for indexing and embedding, and another is generated on-the-fly based on user queries (the goal there is to "reconcile" the user query with each individual item being suggested).

I use GPT-3.5 Turbo, which works well enough for that purpose. Cost of generating the original summaries from raw page contents came down to about $0.01 per item. That could add up quickly, but I was lucky enough to have some OpenAI credits laying around so I didn't have to think much about this or explore alternative options.

GPT-4 would produce nicer summaries for the user-facing portion, but the latency and costs are too high for production. With GPT-3.5 however those are super cheap since they require very few tokens (they operate off of the original summaries mentioned above).

Worth noting that I've processed stories by score descending, and didn't process anything under 50 points which substantially reduced the number of tokens to process.