Hacker News new | past | comments | ask | show | jobs | submit | dhbradshaw's comments login

SEEKING WORK | REMOTE | U.S. or Canada (Located in the Huntsville AL area.)

Automate business processes. Get more done, more accurately, faster, and less painfully.

PhD in laser physics / quantum optics, startup veteran, 15 years of experience in automating systems from laser control using real time video analysis to email and spreadsheet ingestion and routing.

https://github.com/dhbradshaw

Preferred tools: user interviews, systems analysis, documentation, rust, python, typescript, postgres, aws, local llms, evals and testing.

Free consultation, satisfaction guaranteed.

Reach out to dhbradshaw at gmail!


Great for space


Quote:

The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3- 4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

Really cool


The more leverage a piece of code has, the more good or damage it can do.

The more constraints we can place on its behavior, the harder it is to mess up.

If it's riskier code, constrain it more with better typing, testing, design, and analysis.

Constraints are to errors (including hallucinations) as water is to fire.


Interesting to see this coming out of Apple


This is really cool -- going to show it off to my team. I love the fact that you opened it up so that it will work with Jupyter notebooks as well.


Hmm, could be a good time to set up as a contractor serving Salesforce


There is always excellent money to be made as a contractor doing Salesforce work.


I like the idea of having a wiki be mostly serverless -- like lambda backed by s3. A little searching found this:

https://github.com/raboof/serverless-wiki


Tried out 3B on ollama, asking questions in optics, bio, and rust.

It's super fast with a lot of knowledge, a large context and great understanding. Really impressive model.


I question whether a 3B model can have “a lot of knowledge”.


As a point of comparison, the Llama 3.2 3B model is 6.5GB. The entirety of English wikipedia text is 19GB (as compressed with an algorithm from 1996, newer compression formats might do better).

Its not a perfect comparison and Llama does a lot more than English, but I would say 6.5GB of data can certainly contain a lot of knowledge.


From quizzing it a bit it has good knowledge but limited reasoning. For example it will tell you all about the life and death of Ho Chi Minh (and as far as I can verify factual and with more detail than what's in English Wikipedia), but when quizzed whether 2kg of feathers are heavier than 1kg of lead it will get it wrong.

Though I wouldn't treat it as a domain expert on anything. For example when I asked about the safety advantages of Rust over Python it oversold Rust a bit and claimed Python had issues it doesn't actually have


> it oversold Rust a bit and claimed Python had issues it doesn't actually have

So exactly like a human


Well the feathers heavier than lead thing is definitely somewhere in training data.

Imo we should be testing reasoning for these models by presenting things or situations that neither the human or machine has seen or experienced.

Think; how often do humans have a truly new experience with no basis on past ones? Very rarely - even learning to ride a bike it could be presumed that it has a link to walking/running and movement in general.

Even human "creativity" (much ado about nothing) is creating drama in the AI space...but I find this a super interesting topic as essentially 99.9999% of all human "creativity" is just us rehashing and borrowing heavily from stuff we've seen or encountered in nature. What are elves, dwarves, etc than people with slightly unusual features. Even aliens we create are based on: humans/bipedal, squid/sea creature, dragon/reptile, etc. How often does human creativity really, _really_ come up with something novel? Almost never!

Edit: I think my overarching point is that we need to come up with better exercises to test these models, but it's almost impossible for us to do this because most of us are incapable of creating purely novel concepts and ideas. AGI perhaps isn't that far off given that humans have been the stochastic parrots all along.


I wonder if spelling out the weight would work better. two kilogram for wider token input.


It still confidently said that the feathers were lighter than the lead. It did correct itself when I asked it to check again though.


My guess is it uses the same vocabulary size as llama 3.1 which is 128,000 different tokens (words) to support many languages. Parameter count is less of an indicator of fitness than previously thought.


That doesn't address the thing they're skeptical about, which is how much knowledge can be encoded in 3B parameters.

3B models are great for text manipulation, but I've found them to be pretty bad at having a broad understanding of pragmatics or any given subject. The larger models encode a lot more than just language in those 70B+ parameters.


Ok, but what we are probably debating is knowledge versus wisdom. Like, if I know 1+1 = 2, and I know the numbers 1 through 10, my knowledge is just 11, but my wisdom is infinite in the scope of integer addition. I can find any number, given enough time.

I'm pretty sure the AI guys are well aware of which types of models they want to produce. Models that can intake knowledge and intelligently manipulate it would mean general intelligence.

Models that can intake knowledge and only produce subsets of it's training data have a use but wouldn't be general intelligence.


I don't think this is right.

Usually the problem is much simpler with small models: they have less factual information, period.

So they'll do great at manipulating text, like extraction and summarization... but they'll get factual questions wrong.

And to add to the concern above, the more coherent the smaller models are, the more likely they very competently tell you wrong information. Without the usual telltale degraded output of a smaller model it might be harder to pick out the inaccuracies.


Can it speak foreign languages like German, Spanish, Ancient Greek?


Yes. It can converse perfectly normal in German. However when quizzed about German idioms it hallucinates them (in fluent German). Though that's the kind of stuff even larger models often have trouble with. For example if you ask GPT 4 about jokes in German it will give you jokes that depend on word play that only works when translated to English. In normal conversation Llama seems to speak fluent German

For Ancient Greek I just asked it (in German) to translate its previous answer to Ancient Greek, and the answer looks like Greek and according to google translate is a serviceable translation. However Llama did add a cheeky "Πηγή: Google Translate" at the end (Πηγή means source). I know little about the differences between ancient and modern Greek, but it did struggle to translate modern terms like "climate change" or "Hawaii" and added them as annotations in brackets. So I'll assume it at least tried to use Ancient Greek.

However it doesn't like switching language mid-conversation. If you start a conversation in German and after a couple messages switch to English it will understand you but answer in German. Most models switch to answering in English in that situation


“However Llama did add a cheeky "Πηγή: Google Translate" at the end”

That’s interesting; could this be an indicator that someone is running content through GT and training on the results?


Thank you very much for taking your time.

Your findings are Amazing! I have used ChatGPT to proofread compositions in German and French lately, but it would have never occurred to me that I should have tested ability to understand idioms, which are the cherry on the cake. I’ll have it a go

As for Ancient Greek or Latin, ChatGPT has provided consistent translations and great explanations but its compositions had errors that prevented me from using it in the classroom.

All in all, chatGPT is a great multilingual and polyglot dictionary and I’d be glad if I could even use it offline for more autonomy


I have tried to use Llama3-7b and 70b, for Ancient Greek and it is very bad. I will test Llama 3.2, but GPT is great at that. You might want to generate 2 or 3 GPT translations of Ancient Greek and select the best sentences from each one. Alongside with some human corrections, and it is almost unbeatable by any human alone.


Not one of these, but I tried on a small, Lithuanian, language. The catch is what the language has complicated grammar, but not as bad as Finnish, Estonian and Hungarian. I asked to summarise some text and it does the job, but the grammar is not perfect and in some cases, at a foreigner level. Plus, it invented some words with no meaning. E.g. `„Sveika gyvensena“ turi būti *atnemitinamas* viso kurso *vykišioje*.`


In Greek, it's just making stuff up. I asked it how it was, and it asked me how much I like violence. It looks like it's really conflating languages with each other, it just asked me a weird mix of Spanish and Greek.

Yeah, chatting more, it's confusing Spanish and Greek. Half the words are Spanish, half are Greek, but the words are more or less the correct ones, if you speak both languages.

EDIT: Now it's doing Portuguese:

> Εντάξει, πού ξεκίνησα? Εγώ είναι ένα κigneurnative πρόγραμμα ονομάζεται "Chatbot" ή "Μάquina Γλωσσής", που δέχθηκε να μοιράσει τη βραδύτητα με σένα. Φυσικά, não sono um essere humano, así que não tengo sentimentos ou emoções como vocês.


So good it feels like I think maybe I can read their lips


This is the best compliment :) and also a good idea… could a trained lip reader understand what the videos are saying? Good benchmark!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: