Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Velvet – Store OpenAI requests in your own DB (usevelvet.com)
109 points by elawler24 16 days ago | hide | past | favorite | 55 comments
Hey HN! We’re Emma and Chris, founders of Velvet (https://www.usevelvet.com).

Velvet proxies OpenAI calls and stores the requests and responses in your PostgreSQL database. That way, you can analyze logs with SQL (instead of a clunky UI). You can also set headers to add caching and metadata (for analysis).

Backstory: We started by building some more general AI data tools (like a text-to-SQL editor). We were frustrated by the lack of basic LLM infrastructure, so ended up pivoting to focus on the tooling we wanted. So many existing apps, like Helicone, were hard to use as power users. We just wanted a database.

Scale: We’ve already warehoused 50m requests for customers, and have optimized the platform for scale and latency. We’ve built the proxy on Cloudflare Workers, and latency is nominal. We’ve built some “yak shaving” features that were really complex such as decomposing OpenAI Batch API requests so you can track each log individually. One of our early customers (https://usefind.ai/) makes millions of OpenAI requests per day, up to 1500 requests per second.

Vision: We’re trying to build development tools that have as little UI as possible, that can be controlled entirely with headers and code. We also want to blend cloud and on-prem for the best of both worlds — allowing for both automatic updates and complete data ownership.

Here are some things you can do with Velvet logs:

- Observe requests, responses, and latency

- Analyze costs by metadata, such as user ID

- Track batch progress and speed

- Evaluate model changes

- Export datasets for fine-tuning of gpt-4o-mini

(this video shows how to do each of those: https://www.youtube.com/watch?v=KaFkRi5ESi8)

--

To see how it works, try chatting with our demo app that you can use without logging in: https://www.usevelvet.com/sandbox

Setting up your own proxy is 2 lines of code and takes ~5 mins.

Try it out and let us know what you think!




Seems neat - I'm not sure if you do anything like this but one thing that would be useful with RAG apps (esp at big scales) is vector based search over cache contents. What I mean is that, users can phrase the same question (which has the same answer) in tons of different ways. If I could pass a raw user query into your cache and get back the end result for a previously computed query (even if the current phrasing is a bit different than the current phrasing) then not only would I avoid having to submit a new OpenAI call, but I could also avoid having to run my entire RAG pipeline. So kind of like a "meta-RAG" system that avoids having to run the actual RAG system for queries that are sufficiently similar to a cached query, or like a "approximate" cache.


I was impressed by Upstash's approach to something similar with their "Semantic Cache".

https://github.com/upstash/semantic-cache

  "Semantic Cache is a tool for caching natural text based on semantic similarity. It's ideal for any task that involves querying or retrieving information based on meaning, such as natural language classification or caching AI responses. Two pieces of text can be similar but not identical (e.g., "great places to check out in Spain" vs. "best places to visit in Spain"). Traditional caching doesn't recognize this semantic similarity and misses opportunities for reuse."


I strongly advise not relying on embedding distance alone for it because it'll match these two:

1. great places to check out in Spain

2. great places to check out in northern Spain

Logically the two are not the same, and they could in fact be very different despite their semantic similarity. Your users will be frustrated and will hate you for it. If an LLM validates the two as being the same, then it's fine, but not otherwise.


I agree, a naive approach to approximate caching would probably not work for most use cases.

I'm speculating here, but I wonder if you could use a two stage pipeline for cache retrieval (kinda like the distance search + reranker model technique used by lots of RAG pipelines). Maybe it would be possible to fine-tune a custom reranker model to only output True if 2 queries are semantically equivalent rather than just similar. So the hypothetical model would output True for "how to change the oil" vs. "how to replace the oil" but would output False in your Spain example. In this case you'd do distance based retrieval first using the normal vector DB techniques, and then use your custom reranker to validate that the potential cache hits are actual hits


Any LLM can output it, but yes, a tuned LLM can benefit with a shorter prompt.


A hybrid search approach might help, like combining vector similarity scores with e.g. BM25 scores.

Shameless plug (FOSS): https://github.com/jankovicsandras/plpgsql_bm25 Okapi BM25 search implemented in PL/pgSQL for Postgres.


That would totally destroy the user experience. Users change their query so they can get a refined result, not so they get the same tired result.


Even across users it’s a terrible idea.

Even in the simplest of applications where all you’re doing is passing “last user query” + “retrieved articles” into openAI (and nothing else that is different between users, like previous queries or user data that may be necessary to answer), this will be a bad experience in many cases.

Queries A and B may have similar embeddings (similar topic) and it may be correct to retrieve the same articles for context (which you could cache), but they can still be different questions with different correct answers.


Depends on the scenario. In a threaded query, or multiple queries from the same user - you’d want different outputs. If 20 different users are looking for the same result - a cache would return the right answer immediately for no marginal cost.


That's not the use case of the parent comment:

> for queries that are sufficiently similar


Thanks for the detail! This is a use case we plan to support, and it will be configurable (for when you don’t want it). Some of our customers run into this when different users ask a similar query - “NY-based consumer founders” vs “consumer founders in NY”.


A cache is better when it's local rather than on the web. And I certainly don't need to pay anyone to cache local request responses.


How would one achieve something similarly locally, short of just running a proxy and stuffing the request/response pairs into a DB? I'm sure it wouldn't be too terribly hard to write something, but I figure something open source already exists for OpenAI-compatible APIs.


Recently did this workflow.

Started with nginx proxy with rules to cache base on url/params. Wanted more control over it and explored lua/redis apis, and opted to build a app to do be a little more smart for what i wanted. Extra ec2 cost is negligible compared to cache savings.


Yes! It's amazing how many things you can do with lua in nginx. I had a server that served static websites where the files and the certificates for each website were stored in a bucket. Over 20k websites with 220ms overhead if the certificate wasn't cached.


There are any number of databases and language-specific caching libraries. A custom solution or the use of a proxy isn't necessary.


As I understand it, your data remains local, as it leverages your own database.


Why do I even ha e to use this saas? This should be a open source lib or just a practice that I implement myself.


Implement it yourself then and save your $$ at the expense of your time.


If you factor in dealing with somebody's black box code 6 months into a project, you'll realise you're saving both money and time.


It's not complicated as you make it. There are numerous caching libraries, and databases have been a thing for decades.


Like this is not a big thing to implement, that's my point. There are already libraries like OpenLLMetry and sink to a DB. We are doing something like this already.


Yes, the ol' Dropbox "you can already build such a system yourself quite trivially by getting an FTP account" comment. Even after 17 years, people still feel the need to make this point.


So they can charge you for it.


Congrats on the launch! I love the devex here and things you're focusing on.

Have you had thoughts on how to you might integrate data from an upstream RAG pipeline, say as a part of a distributed trace, to aid in debugging the core "am I talking to the LLM the right way" use case?


Thanks! You can layer on as much detail as you need by including meta tags in the header, which is useful for tracing RAG and agent pipelines. But would love to understand your particular RAG setup and whether that gives you enough granularity. Feel free to email me too - emma@usevelvet.com


Looks cool. Just out of curiosity, how does this compare to other OpenLLMetry-type observation tools like Arize, Traceloop, LangSmith, LlamaTrace, etc.?

From personal experience, they're all pretty simple to install and use. Then mileage varies in analyzing and taking action on the logs. Does Velvet offer something the others do not?

For my client projects, I've been leaning towards open source platforms like Arize so clients have the option of pulling it inhouse if needed. Most often for HIPAA requirements.

RAG support would be great to add to Velvet. Specifically pgvector and pinecone traces. But maybe Velvet already supports it and I missed it in the quick read of the docs.


Velvet takes <5 mins to get set up in any language, which is why we started as a proxy. We offer managed / custom deployments for enterprise customers, so we can support your client requirements.

We warehouse logs directly to your DB, so you can do whatever you want with the data. Build company ops on top of the DB, run your own evals, join with other tables, hash data, etc.

We’re focusing on backend eng workflows so it’s simple to run continuous monitoring, evals, and fine-tuning with any model. Our interface will focus on surfacing data and analytics to PMs and researchers.

For pgvector/pinecone RAG traces - you can start by including meta tags in the header. Those values will be queryable in the JSON object.

Curious to learn more though - feel free to email me at emma@usevelvet.com.


disclosure: founder/maintainer of Langfuse (OSS LLM application observability)

I believe proxy-based implementations like Velvet are excellent for getting started and solve for the immediate debugging use case; simply changing the base path of the OpenAI SDK makes things really simple (the other solutions mentioned typically require a few more minutes to set up).

At Langfuse (similarly to the other solutions mentioned above), we prioritize asynchronous and batched logging, which is often preferred for its scalability and zero impact on uptime and latency. We have developed numerous integrations (for openai specifically an SDK wrapper), and you can also use our SDKs and Decorators to integrate with any LLM.

> For my client projects, I've been leaning towards open source platforms like Arize so clients have the option of pulling it inhouse if needed. Most often for HIPAA requirements.

I can echo this. We observe many self-hosted deployments in larger enterprises and HIPAA-related companies, thus we made it very simple to self-host Langfuse. Especially when PII is involved, self-hosting makes adopting an LLM observability tool much easier in larger teams.


Thanks! I'll give self hosting LangFuse a try for HIPAA projects. And happy to pay for cloud for other projects.

I don't understand the problem that's being solved here. At the scale you're talking about (e.g. millions of requests per day with FindAI), why would I want to house immutable log data inside a relational database, presumably alongside actual relational data that's critical to my app? It's only going to bog down the app for my users.

There are plenty of other solutions (examples include Presto, Athena, Redshift, or straight up jq over raw log files on disk) which are better suited for this use case. Storing log data in a relational DB is pretty much always an anti-pattern, in my experience.


Philip here from Find AI. We store our Velvet logs in a dedicated DB. It's postgres now, but we will probably move it to Clickhouse at some point. Our main app DB is in postgres, so everybody just knows how it works and all of our existing BI tools support it.

Here's a video about what we do with the data: https://www.youtube.com/watch?v=KaFkRi5ESi8


It's a standalone DB, just for LLM logging. Since it's your DB - you can configure data retention, and migrate data to an analytics DB / warehouse if cost or latency becomes a concern. And, we're happy to support whatever DB you require (ClickHouse, Big Query, Snowflake, etc) in a managed deployment.


I guess I should have elaborated to say that even if you're spinning up a new database expressly for this purpose (which I didn't see specifically called out in your docs anywhere as a best practice), you're starting off on the wrong foot. Maybe I'm old-school, but relational databases should be for relational data. This data isn't relational, it's write-once log data, and it belongs in files on disk, or in purpose-built analytics tools, if it gets too large to manage.


Got it. We can store logs to your purpose-built analytics DB of choice.

PostgreSQL (Neon) is our free self-serve offering because it’s easy to spin up quickly.


> we were frustrated by the lack of LLM infrastructure

May I ask what you specifically were frustrated about? Seems like there are more than enough solutions


There were plenty of UI-based low code platforms. But they required that we adopt new abstractions, use their UI, and log into 5 different tools (logging, observability, analytics, evals, fine-tuning) just to run basic software infra. We didn’t feel these would be long-term solutions, and just wanted the data in our own DB.


Very nice! I really like the design of the whole product, very clean and simple. Out of curiosity, do you have a designer, or did you take inspiration from any other products (for the landing page, dashboard, etc) when you were building this? I'm always curious how founders approach design these days.


I’m a product designer, so we tend to approach everything from first principles. Our aim is to keep as much complexity in code as possible, and only surface UI when it solves a problem for our users. We like using tools like Vercel and Supabase - so a lot of UI inspiration comes from the way they surface data views. The AI phase of the internet will likely be less UI focused, which allows for more integrated and simple design systems.


Nice! Sort of like Langsmith without the Langchain, which will be an attractive value proposition to many developers.


Howdy Erick from LangChain here! Just a quick clarification that LangSmith is designed to work great for folks not using LangChain as well :)

Check out our quickstart for an example of what that looks like! https://docs.smith.langchain.com/


TIL! LangSmith is great.


Does it support MySQL for queries/storage - or only PostgreSQL?

Also, caught a few typos on the site: https://triplechecker.com/s/o2d2iR/usevelvet.com?v=qv9Qk


We can support any database you need, PostgreSQL is the easiest way to get started.


Neat! I'd love to play with this, but site doesn't open (403: Forbidden).


Might be a Cloudflare flag. Can you email me your IP address and we'll look into it? emma@usevelvet.com.


Error: Forbidden

403: Forbidden ID: bom1::k5dng-1727242244208-0aa02a53f334


This seems to require sharing our data we provide to OpenAI with yet another party. I don't see any zero-retention offering.


The self-serve version is hosted (it’s easy to try locally), but we offer managed deployments where you bring your own DB. In this case your data is 100% yours, in your PostgreSQL. That’s how Find AI uses Velvet.


Where is this mentioned? Is there a github(etc) somewhere that someone can use this without using the hosted version?


Right now, it’s a managed service that we set up for you (we’re still a small team). Email me if you’re interested and I can share details - emma@usevelvet.com.


interesting, seems more of an enterprise offering. its OpenAI only for and you plan to expand to other vendors? anything opensource?


I guess I don't understand what this is now. If its just proxying requests and storing in db, can't it be literally any API?


We could support any API. We’re focused on building data pipelines and tooling for LLM use cases.


We already support OpenAI and Anthropic endpoints, and can add models/endpoints quickly based on your requirements. We plan to expand to Llama and other self-hosted models soon. Do you have a specific model you want supported?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: