Lately, I've been tinkering with llama.cpp and the ollama server. The speed of these tools caught my attention, even on my modest 4060 setup. I was quite impressed with the generation quality of models like Mistral.
But I was a bit unhappy at the same time because whenever I explore a topic, there is a lot of typing involved when using the chat interface. So I needed a tool to not only give a response but also generate a set of "suggestions" which can be explored further just by clicking.
My experience in front-end development is limited. Nonetheless, I tinkered together a small web app to achieve the same goal. It is built with vuejs3+vuetify.
Code: https://github.com/charstorm/llmbinge/