Hacker News new | past | comments | ask | show | jobs | submit login

Personally I use llama3.1:8b or mistral-nemo:latest which have a decent contex window (even if it is less than the commercial ones usually). I am working on a token calculator / division of the content method too but is very early



why not llama3.2:3B? it has fairly large context window too


I assume because the 8B model is smarter than the 3B model; it outperforms it on almost every benchmark: https://huggingface.co/meta-llama/Llama-3.2-3B

If you have the compute, might as well use the better model :)

The 3.2 series wasn't the kind of leap that 3.0 -> 3.1 was in terms of intelligence; it was just:

1. Meta releasing multimodal vision models for the first time (11B and 90B), and

2. Meta releasing much smaller models than the 3.1 series (1B and 3B).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: