"A (tiny) AI model can not cite sources, it can only hallucinate citations."
I don't think that's definitely true, if you build the system around it well. It would look something like this:
1. User asks a question. LLM extracts key concepts from that question to use as search terms.
2. LLM triggers a search of Wikipedia, getting back snippets of pages, each with their page identifier.
3. LLM is fed those snippets along with the user's question and instructions to cite pages that it uses content from.
4. LLM generates a response which includes formatted citations. This response may be complete garbage, but...
5. Your code can at least confirm that the citations correspond to the pages that you fed into the LLM.
I've seen this approach work well with larger models. The open question is if it could work with smaller ones.
The 7B models (Mistral and its variants in particular) are getting VERY effective. I'm confident they could mostly-work for the above sequence... and you can just about run a 7B model on a phone.
The bigger question for me is if you could get this to work with a 3B model, since those are much more mobile-device friendly than 7B.
I don't think that's definitely true, if you build the system around it well. It would look something like this:
1. User asks a question. LLM extracts key concepts from that question to use as search terms.
2. LLM triggers a search of Wikipedia, getting back snippets of pages, each with their page identifier.
3. LLM is fed those snippets along with the user's question and instructions to cite pages that it uses content from.
4. LLM generates a response which includes formatted citations. This response may be complete garbage, but...
5. Your code can at least confirm that the citations correspond to the pages that you fed into the LLM.
I've seen this approach work well with larger models. The open question is if it could work with smaller ones.
The 7B models (Mistral and its variants in particular) are getting VERY effective. I'm confident they could mostly-work for the above sequence... and you can just about run a 7B model on a phone.
The bigger question for me is if you could get this to work with a 3B model, since those are much more mobile-device friendly than 7B.