Offline in-browser semantic search with only a small model sounds like it has some great potential!
I'd love to see a more compelling example / demo. The demo page you have is fairly short and the content being searched seems semantically very narrow (it's all text about word embeddings!). As someone who hasn't spent any time trying out similar systems or trying to build something like this, it's difficult for me to come up with queries to actually judge how well this is working, given the small content in the demo page. So overall, that makes the actual demo not very compelling for me. I'm left uncertain how effective it might be for a more realistic use-case.
Basically if you had a better demo - larger and more varied example content, and a few example queries to show off what this can do - then I think the whole thing would be more immediately interesting (to me).
This makes sense, thank you very much for the feedback!
Another demo idea I had is to have an input field where the user can enter a GitHub username, retrieve all the starred repositories, and enable semantic search on the titles and descriptions of those repositories.
The main idea is that users will typically enter their own usernames, and therefore, they are familiar with the repositories they have starred, which provides a better search experience when testing the component.
Let me know if something else would work better for you.
Not the OP, but personally I'd prefer a search on some realistic examples, like maybe the React docs (https://react.dev/learn) or Next docs (https://nextjs.org/docs) that I frequently struggle to search, especially for complex questions like "what kind of caching does Next.js do" or "what is the proper way to do an async clientside fetch".
--------
Some other feedback (just so I don't make a bunch of separate posts):
- Demo #2 404s
- It would be nice to have some way to highlight or summarize the relevant parts of search results, especially when they're "semantically" searched and each result is several dozen words. There's no easy way to skim the results, and it's not really clear to me (as a user) why the rankings are the way they are. It just looks like a bunch of reordered paragraphs that I still have to read all of.
- 20 MB is a LOT to ask a client to download just to run a search bar. Is there any way to run this as a serverside function / serverless?
Thanks for the feedback! I'll find time to make it better and retry the submission. Now I know it's actually possible to get on the first page :)
Regarding your last point: yes, 20MB is a lot, but the whole point of it is to have it all on client side, within a single component you install. You can already achieve the functionality you mention with the standard MUI's autocomplete. That being said, I'll look into ways to use smaller models.
I wasn't expecting it, but I actually got some votes <3, so here is a better description:
This is a React component for searching/sorting by meaning (not by "characters included in a string", like standard search).
It uses a small ML model that runs on client side (inside the component!). When I say small, I mean ~20MB. The model will be downloaded only once (first time) and afterwards imported from browser's cache.
You can use this component to search and filter by meaning a dropdown list or an external list (like paragraphs of a webpage). You can search with sentences on sentences, not just with small words/substrings.
I'd love to see a more compelling example / demo. The demo page you have is fairly short and the content being searched seems semantically very narrow (it's all text about word embeddings!). As someone who hasn't spent any time trying out similar systems or trying to build something like this, it's difficult for me to come up with queries to actually judge how well this is working, given the small content in the demo page. So overall, that makes the actual demo not very compelling for me. I'm left uncertain how effective it might be for a more realistic use-case.
Basically if you had a better demo - larger and more varied example content, and a few example queries to show off what this can do - then I think the whole thing would be more immediately interesting (to me).