More

voxgen · 2024-11-24T09:04:05 1732439045

> it's not clear if that was the author's actual intention

The paper[1] doesn't appear to have any other connections to the book/response/memes. A clear distinction is that the UB paper very directly and prominently states the question, rather than cloaking it in allusion or having a lengthy preface trying to contextualize it.

[1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p34...

anonnon · 2024-11-24T09:27:36 1732440456

My point was that at worst it was just some edgelord humor.

voxgen · 2024-08-22T11:08:22 1724324902

I've seen similar reactions and I can't help but think she's intentionally communicating provocatively to make people engage their brains.

You shouldn't just "take her seriously", you should take what she says *critically*. Hear the information and opinions, then decide for yourself whether to accept them.

Aurornis · 2024-08-22T12:17:17 1724329037

> Hear the information and opinions, then decide for yourself whether to accept them.

This sounds awfully similar to the “do your own research” defense that is often used as a cop-out disclaimer for quackery topics.

When someone presents themselves as an expert on a topic and invests a lot of time into making convincing videos about their beliefs, defending them with a “do your own research” feels like a tacit admission that they’re not actually the expert they present themselves as.

This feels somewhat like the high-brow intellectual equivalent of Joe Rogan making confident statements about COVID and then defending himself with “I’m just a comedian, do your own research”. You can’t have it both ways.

voxgen · 2024-08-22T18:21:33 1724350893

The difference is sources. Sabine shows her sources prominently on screen, with searchable citations to find the original. She makes it clear in her phrasing whether she's paraphrasing a source, or passing her own judgement.

It's easy to know whether to internalize what she says when you view it critically. Ask "does the presented research seem legit, complete, and impartial?" and "is her conclusion logical?". She gives you the receipts to check. This is not the same as deciding whether to put blind faith into a comedian's off-the-cuff anecdotes and opinions.

I often disagree with her conclusions, but at least she makes it very easy to validate her chain of though, find where our views diverge, and only absorb the information I trust.

imwillofficial · 2024-08-22T14:21:28 1724336488

No, it sounds like he is promoting hearing somebody out and thinking for yourself if they are to be trusted.

evilduck · 2024-08-22T14:12:23 1724335943

Her channel has strayed far beyond the topics she has credibility in. A physics academic talking about AI, sociology, and politics… why should I care? Even of the physics topics that she does cover it’s all “pop-sci” news coverage stuff, she’s not even using her actual depth of knowledge to make videos that are different than the layman takes from dozens of other YouTubers.

Someone speaking provocatively and authoritatively on topics they don’t have credibility in is where you should think critically and turn it off.

voxgen · 2024-08-22T18:32:43 1724351563

> Her channel has strayed far beyond the topics she has credibility in.

I appreciate that she makes her videos so easily verifiable, by prominently showing her research, that it was easy to see the point when this started happening and tune out. A lot of opinion-faucets on the internet try to be irrefutable by hiding their sources.

I don't trust Sabine intrinsically, but I trust that I can notice when she under-researches a topic or makes a leap of logic. She conveys enough good information that I find it worth my time to watch.

voxgen · 2024-04-10T07:05:03 1712732703

The patients know this. Asking for consent would still yield the vast majority of the data.

It would also mean more people analyzing it - I know at least one big pharma company that won't touch patient data that was taken without consent because they don't want to be associated with such unethical practices.

voxgen · 2024-04-09T20:45:45 1712695545

I've had 2 sets of AirPods die because of the batteries aging and losing their ability to hold a charge. Though to be fair, their microphones failed much earlier then their batteries, and noise cancelling gets some really annoying failure modes when the microphone has issues. I should have replaced them sooner. I also should find a brand that doesn't have consistent manufacturing quality issues...

code_runner · 2024-04-09T20:55:00 1712696100

microphones have failed for me in the past as well, though to be fair, apple gave me a free repair for one issue on their end.

voxgen · 2024-02-21T19:04:11 1708542251

The *GLU-based activations functions like GEGLU and SwiGLU use 2 input values to produce 1 output value, which makes these numbers weird. In each value pair, one goes through the GELU/SiLU activation function and is then multiplied by the other "gate" value.

In the report, "hidden dim" matches the number of GEGLU inputs. In the config, "intermediate_size" matches the number of GEGLU outputs. Most *GLU models so far have used intermediate_size=8/3*d_model as this makes have the same number of matmul FLOPS & parameters as a 4x-expanded non-GLU model, and PaLM vaguely showed that 4x is better than a smaller expansion factor.

If one considers Llama-2-7B's FFN expansion factor to be ~5.33x, Gemma's expansion factor is 16x.

GaggiX · 2024-02-21T19:24:41 1708543481

Makes perfect sense thx

voxgen · 2024-02-21T18:33:08 1708540388

Thank you very much for releasing these models! It's great to see Google enter the battle with a strong hand.

I'm wondering if you're able to provide any insight into the below hyperparameter decisions in Gemma's architecture, as they differ significantly from what we've seen with other recent models?

* On the 7B model, the `d_model` (3072) is smaller than `num_heads * d_head` (16*256=4096). I don't know of any other model where these numbers don't match.

* The FFN expansion factor of 16x is MUCH higher than the Llama-2-7B's 5.4x, which itself was chosen to be equi-FLOPS with PaLM's 4x.

* The vocab is much larger - 256k, where most small models use 32k-64k.

* GQA is only used on the 2B model, where we've seen other models prefer to save it for larger models.

These observations are in no way meant to be criticism - I understand that Llama's hyperparameters are also somewhat arbitrarily inherited from its predecessors like PaLM and GPT-2, and that it's non-trivial to run hyperopt on such large models. I'm just really curious about what findings motivated these choices.

owl_brawl · 2024-02-21T19:42:50 1708544570

I would love answers to these questions too, particularly on the vocab size

voxgen · 2024-01-05T12:55:09 1704459309

If it was cheaper, or part of a bundle, or included unlimited API access, I'd consider it. GPT4's high price is always in the back of my mind when I use it.

But does Bard really qualify as the second best? Gemini Pro is pretty bad. I'd choose Claude over it any day.

Gemini Ultra's claimed superiority required a mountain of prompt optimization and repeated generations to get good scores. The "CoT@32" benchmark results costs at least 32x as many tokens, probably closer to 64x with the hidden chain of thought. I don't have high hopes this mode will be affordable or fast enough for chat.

voxgen · on March 23, 2023

One can also make a model to learn the necessary context length for each layer and head to save a huge amount of FLOPs: https://arxiv.org/abs/1905.07799

voxgen · on Dec 9, 2020

Actually, blood pressure is the determiner of whether high intake of salt is bad for you.

High sodium intake is only associated with increased risk in people with high blood pressure. However, having too little sodium is associated with increased risk regardless of blood pressure. Source: https://www.thelancet.com/journals/lancet/article/PIIS0140-6...

voxgen · on Nov 11, 2020

This was an expected step of the evolution of the Memory Hierarchy. RAM is just bigger-but-slower CPU cache. Swapfile is just bigger-but-slower RAM. There's always a trade-off between size and speed.

Incorporating RAM into the CPU package gives a big performance boost at the expense of maximum capacity. To compensate for the reduced performance in tasks that require more memory, you can either add another layer into the Memory Hierarchy (i.e. CPU Cache -> Fast RAM -> Slow RAM -> Swapfile), or you can improve Swapfile speed. It appears Apple have done the latter - they claim their SSD is up to 2x faster.

Video game consoles have done something similar with the latest generation. Only a 2x increase in RAM from the last generation (compared to an 8x/16x increase from the generation before), but a focus on storage performance.

BatteryMountain · on Nov 11, 2020

Jisss to be honest the new PlayStation and Xbox hardware is amazing, specificially how the memory/cpu/mobo and streaming system will work. Wish we had something like that in PC land. Then again, their workflow is basically streaming data from disk, thus this round they optimized that hell out of that chunk of the pipeline.

Hopefully AMD learn lessons from that sphere and can bring some of that to our sphere.

augusto-moura · on Nov 11, 2020

The problem I see with the PlayStation 5 at least is repairs, if the solded SSD kick the bucket you will probably need to: buy a new chip from Sony and resolder the fault one (or even worse, resolder the entire SSD chips because you can't rely in which cell is physically the data and swapping the cell will probably corrupt the entire disk), or buy a new PS5/Motherboard

You can't just swap the hard disk like before, I'm sure this you be a major headache with repairs