Hey HN, yesterday we shared a post[0] about how you can store OpenAI embeddings in Postgres with pgvector.
This is a follow up demonstration, adding a "ChatGPT" interface to our own docs. To use it, you can go to https://supabase.com/docs and then type "cmd + /". This will pull up a "Clippy" interface, where you can ask it questions about supabase (sorry in advance to mobile users)
In "Show HN" spirit, it's very hacky/MVP, so I expect it will break. We'd value any feedback.
Note that pgvector isn't supported on any of the large cloud providers' hosted Postgres offerings, other than Supabase. https://github.com/pgvector/pgvector#hosted-postgres has instructions on how to add your voice to request it to be added!
(It does seem that the ancient https://github.com/eulerto/pg_similarity is supported by RDS and Google Cloud - but it's hard to tell whether attention was paid to its performance characteristics with nearly the rigor that pgvector seems to have been designed.)
Feedback: Please add a button. This seems super useful, but unfortunately I can't test it. I'm on a non US Layout and can't use the shortcut "cmd + /". On my keyboard a "/" is opt + 7 and using "opt + cmd + 7" unsurprisingly does nothing.
That's so interesting. I'm sure other companies will integrate GPT on their developer documentations. We have built similar tool called Corpora https://askcorpora.com It allows users to upload their pdf files and search through them in natural language and perform Q&A. It uses same technology that ClippyGPT used. Do you have any feedback for us?
I remember looking for Android integrations a few days ago but getting only 3rd party repos, so I thought maybe this interface can point to the most adequate one.
With typical ChatGPT confidence it pointed me to the Supabase Android SDK at
https://supabase.com/docs/android which is 404 :)
awesome velocity - i could have sworn this was another of your famous Launch Weeks but its just a regular week for you now lol
there's been a lot of these bots coming out, and some of the poorly implemented ones are probably going to make this endeavor look bad, and i'm wondering for your thoughts - what Quality Control/testing approach do you think makes sense for an unbounded chat bot like this?
in our MVP checklist we had "XSS testing" and "Prompt injection", the latter somewhat cursory because there's no feasible way to prevent it right now. We found a lot of ways to "break out" of the prompt (which is also very visible since we're open source). Luckily, prompt break-outs are relatively benign (as long as we have spend-caps on OpenAI).
The biggest win for companies like us is that something like this becomes ubiquitous, so that people get bored of prompt-hacking and then just use the tool like they are supposed to. Over time we'll add rate-limiting, caching, and other hardening.
I’m actually working on an open source project to mitigate against prompt injections, handle cache, etc. and I’m collecting instances of prompt misuse. I’d love to see your findings. Did you document them anywhere that I can review?
it looks like we're getting 1800 requests per hour - about $5/hour, which is pretty cheap considering the traffic seeing this blog post. That said, only desktop users who have access to the keyboard and have read the blog post can use it. If we made it visible to everyone I think we would quickly run into API limits.
$5 for 1800 requests, that is 0.0027 per request, text-davinci-003 price is $0.02/1k token, so the average token number is 135. That is far below my expectation.
Awesome stuff! If anyone is interested in a hosted and managed version of this, let us know! We can do this on top of any set of information already (not just .mdx) and have it exposed either as an API or through production ready Slack and Discord bots :)
Very cool! I'm looking forward to "smart search bars" (coined by me) being common place where you can search for things and it returns both knowledge-base/web pages that are relevant and its own color based on results, context, and other sources.
the neat thing about the vector implementation is that we can immediately return the relevant pages while GPT formulates an answer, which gives this a hybrid "search + ask" function. We'll aim to do that in the next iteration
I think for code it will be pretty hard, unless the writer gives explicit permission to train the model on the copyrighted data, like the case for Copilot in GitHub.
And I don't think it is fair use at all, imagine a company like OpenAI train the model on its own internal docs and code, then you'll be able to ask the model to replicate ChatGPT and copilot, or even closed software like Photoshop.
I'm not talking about Supabase Clippy, but more about training models on copyrighted data without asking for permission (like private copyrighted code in GitHub for example and yes, I don't call that fair use)
Supebase Clippy uses the same trained model as Copilot, OpenAI’s GPT family of large-language models, including having trained on all of the code in GitHub, without having asked permission and without regard to copyright license.
These tools are being released under the assumption that training large-language models will be found as fair use of any copyrighted works.
Are y’all starting to see the arguments for why the model and the outputs of the model are two different issues and that the models themselves, and the products built on top of them, will be considered fair use and that the liability for copyright infringement lays completely with the person using the tool?
Please explain to me what's Supabase, but make it rhyme.
Supabase is a great place,
For hosting your database.
It's open source and free,
So you can use it with glee.
It's simple to get started,
No infrastructure to be charted.
It's a hosted platform,
So you can use it with no qualm.
fun anecdote: we have a bunch of integration guides (supabase + some other product), and in first few iterations we found that Clippy was recommending those products instead of Supabase. eg: "how can I get started with auth" would recommend trying out supertokens (another great open source auth solution)
Great, not only is ChatGPT better at rhyming than me, it also sounds more sophisticated while doing it. LLMs fail to pass for sophomoric when viewed by an expert, but to the mediocre it exposes one's shortcomings. Maybe because they sort of are big giant averaging machines.
This is a follow up demonstration, adding a "ChatGPT" interface to our own docs. To use it, you can go to https://supabase.com/docs and then type "cmd + /". This will pull up a "Clippy" interface, where you can ask it questions about supabase (sorry in advance to mobile users)
In "Show HN" spirit, it's very hacky/MVP, so I expect it will break. We'd value any feedback.
[0] Storing OpenAI embeddings in Postgres: https://news.ycombinator.com/item?id=34684593