SEEKING WORK | REMOTE | U.S. or Canada (Located in the Huntsville AL area.)
Automate business processes.
Get more done, more accurately, faster, and less painfully.
PhD in laser physics / quantum optics, startup veteran, 15 years of experience in automating systems from laser control using real time video analysis to email and spreadsheet ingestion and routing.
The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2
for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe
significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-
4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across
benchmarks. We release all our models to the community.
As a point of comparison, the Llama 3.2 3B model is 6.5GB. The entirety of English wikipedia text is 19GB (as compressed with an algorithm from 1996, newer compression formats might do better).
Its not a perfect comparison and Llama does a lot more than English, but I would say 6.5GB of data can certainly contain a lot of knowledge.
From quizzing it a bit it has good knowledge but limited reasoning. For example it will tell you all about the life and death of Ho Chi Minh (and as far as I can verify factual and with more detail than what's in English Wikipedia), but when quizzed whether 2kg of feathers are heavier than 1kg of lead it will get it wrong.
Though I wouldn't treat it as a domain expert on anything. For example when I asked about the safety advantages of Rust over Python it oversold Rust a bit and claimed Python had issues it doesn't actually have
Well the feathers heavier than lead thing is definitely somewhere in training data.
Imo we should be testing reasoning for these models by presenting things or situations that neither the human or machine has seen or experienced.
Think; how often do humans have a truly new experience with no basis on past ones? Very rarely - even learning to ride a bike it could be presumed that it has a link to walking/running and movement in general.
Even human "creativity" (much ado about nothing) is creating drama in the AI space...but I find this a super interesting topic as essentially 99.9999% of all human "creativity" is just us rehashing and borrowing heavily from stuff we've seen or encountered in nature. What are elves, dwarves, etc than people with slightly unusual features. Even aliens we create are based on: humans/bipedal, squid/sea creature, dragon/reptile, etc. How often does human creativity really, _really_ come up with something novel? Almost never!
Edit: I think my overarching point is that we need to come up with better exercises to test these models, but it's almost impossible for us to do this because most of us are incapable of creating purely novel concepts and ideas. AGI perhaps isn't that far off given that humans have been the stochastic parrots all along.
My guess is it uses the same vocabulary size as llama 3.1 which is 128,000 different tokens (words) to support many languages. Parameter count is less of an indicator of fitness than previously thought.
That doesn't address the thing they're skeptical about, which is how much knowledge can be encoded in 3B parameters.
3B models are great for text manipulation, but I've found them to be pretty bad at having a broad understanding of pragmatics or any given subject. The larger models encode a lot more than just language in those 70B+ parameters.
Ok, but what we are probably debating is knowledge versus wisdom. Like, if I know 1+1 = 2, and I know the numbers 1 through 10, my knowledge is just 11, but my wisdom is infinite in the scope of integer addition. I can find any number, given enough time.
I'm pretty sure the AI guys are well aware of which types of models they want to produce. Models that can intake knowledge and intelligently manipulate it would mean general intelligence.
Models that can intake knowledge and only produce subsets of it's training data have a use but wouldn't be general intelligence.
Usually the problem is much simpler with small models: they have less factual information, period.
So they'll do great at manipulating text, like extraction and summarization... but they'll get factual questions wrong.
And to add to the concern above, the more coherent the smaller models are, the more likely they very competently tell you wrong information. Without the usual telltale degraded output of a smaller model it might be harder to pick out the inaccuracies.
Yes. It can converse perfectly normal in German. However when quizzed about German idioms it hallucinates them (in fluent German). Though that's the kind of stuff even larger models often have trouble with. For example if you ask GPT 4 about jokes in German it will give you jokes that depend on word play that only works when translated to English. In normal conversation Llama seems to speak fluent German
For Ancient Greek I just asked it (in German) to translate its previous answer to Ancient Greek, and the answer looks like Greek and according to google translate is a serviceable translation. However Llama did add a cheeky "Πηγή: Google Translate" at the end (Πηγή means source). I know little about the differences between ancient and modern Greek, but it did struggle to translate modern terms like "climate change" or "Hawaii" and added them as annotations in brackets. So I'll assume it at least tried to use Ancient Greek.
However it doesn't like switching language mid-conversation. If you start a conversation in German and after a couple messages switch to English it will understand you but answer in German. Most models switch to answering in English in that situation
Your findings are Amazing! I have used ChatGPT to proofread compositions in German and French lately, but it would have never occurred to me that I should have tested ability to understand idioms, which are the cherry on the cake. I’ll have it a go
As for Ancient Greek or Latin, ChatGPT has provided consistent translations and great explanations but its compositions had errors that prevented me from using it in the classroom.
All in all, chatGPT is a great multilingual and polyglot dictionary and I’d be glad if I could even use it offline for more autonomy
I have tried to use Llama3-7b and 70b, for Ancient Greek and it is very bad. I will test Llama 3.2, but GPT is great at that. You might want to generate 2 or 3 GPT translations of Ancient Greek and select the best sentences from each one. Alongside with some human corrections, and it is almost unbeatable by any human alone.
Not one of these, but I tried on a small, Lithuanian, language. The catch is what the language has complicated grammar, but not as bad as Finnish, Estonian and Hungarian. I asked to summarise some text and it does the job, but the grammar is not perfect and in some cases, at a foreigner level. Plus, it invented some words with no meaning. E.g. `„Sveika gyvensena“ turi būti *atnemitinamas* viso kurso *vykišioje*.`
In Greek, it's just making stuff up. I asked it how it was, and it asked me how much I like violence. It looks like it's really conflating languages with each other, it just asked me a weird mix of Spanish and Greek.
Yeah, chatting more, it's confusing Spanish and Greek. Half the words are Spanish, half are Greek, but the words are more or less the correct ones, if you speak both languages.
EDIT: Now it's doing Portuguese:
> Εντάξει, πού ξεκίνησα? Εγώ είναι ένα κigneurnative πρόγραμμα ονομάζεται "Chatbot" ή "Μάquina Γλωσσής", που δέχθηκε να μοιράσει τη βραδύτητα με σένα. Φυσικά, não sono um essere humano, así que não tengo sentimentos ou emoções como vocês.
Automate business processes. Get more done, more accurately, faster, and less painfully.
PhD in laser physics / quantum optics, startup veteran, 15 years of experience in automating systems from laser control using real time video analysis to email and spreadsheet ingestion and routing.
https://github.com/dhbradshaw
Preferred tools: user interviews, systems analysis, documentation, rust, python, typescript, postgres, aws, local llms, evals and testing.
Free consultation, satisfaction guaranteed.
Reach out to dhbradshaw at gmail!
reply