Could someone please summarize the differences (or similarities) of the LLM part...

freeqaz · 2024-01-29T17:50:42

The video example is using Phi-2 which is a 2.7bn param network. I think that's part of how they're achieving the low latency here!

Has anybody fine-tuned Phi-2? I haven't found any good resources for that yet.

renus · 2024-01-29T18:00:24

We tested https://huggingface.co/cognitivecomputations/dolphin-2_6-phi... as well, in some tasks it performs better. That said, you can use Mistral as well, we support a few models through TensorRT-LLM.