Yes I was very surprised after the whole "scandal" around ChatGPT becoming too sycophantic that there was this massive change in tone from the last preview model (05-06) to the 06-05/GA model. The tone is really off-putting, I really liked how the preview versions felt like intelligent conversation partners and recognize what you're saying about useful pushback - it was my favorite set of models (the few preview iterations before this one) and I'm sad to see them disappearing.
Many people on the Google AI Developer forums have also noted either bugs or just performance regression in the final model.
https://ppc.cs.aalto.fi/ covers some of this (overlapping with the topics the person you responded to mentioned, but not covering all, and including some others)
If the only moat really will be the scale of computation resources, that's great news for users, because it will be an extremely competitive market where prices will be driven down very effectively.
I suspect that model quality/vibes and integrations will play a role as well though.
Just start. Most shirt construction is based on a two-piece collar, a yoke, and button front. I started with a cheap pattern (McCalls 2447) that covered those aspects, and started modifying from there, but any dress or sport shirt pattern will give you the same basic construction.
Any machine will do. I had borrowed a cheapo Singer Simple machine from my father in law to make some pillows. Reverse lever broke on my second shirt, made a new one with some D-shaft and coupler from Mcmaster-Carr and still using it.
First shirt was recognizable as a shirt and sort of fit, second shirt was better, and by the third I had mostly dialed things in. Like any skill, practice makes perfect. Mostly sewn woven fabric, except for some TOS tunics with the velour knits. Made a pair of pants that I wasn't happy with and intend to loop back to that one of these days. Thing is, wild fabric is more appropriate for shirts than pants, so that is where I have focused.
Not the commenter you asked but I inherited my mother's sewing machine and decided to make a ball cap by seam ripping apart one that was wearing out but fit really well.
I traced the pieces onto new fabric (waxed cotton) and reused the existing plastic brim insert. It's still in use and I enjoy telling people I made it although there a few things I'd do differently. I watched a few yt videos on constructing a cap for tips on things like machine settings, top stitching, fastenings etc then just winged it after I felt like my theoretical knowledge had plateaued.
It would undoubtedly have been cheaper to buy a new cap, but since I'm unemployed, long term burned out and also newly-diagnosed with ADHD, some things are just gut feeling without making a great deal of sense these days.
Now as things wear out I'm cutting them apart to study the pieces and make new versions. It's surprising what you can make. I just made some quilted slippers for my kid by tracing around his feet and using scrap leather I had lying around. My next project is a pair of trousers. Weirdly, as a recovering perfectionist, I find myself a lot more open to making prototypes and learning from mistakes than I ever was in my career.
I can't tell if you're insinuating that Singapore is a pass-through for H100's heading towards China or whether there is some significant development taking place in Singapore that I'm unaware of?
> Singapore plays a vital role in Nvidia's global business, accounting for 22% of its revenue as of Q3 FY2025, up from 9% in Q3 FY2023 when the first significant restrictions on AI GPU sales to Chinese were introduced
What's happening definitely makes me nervous, but "at best a WW3-event and at worst an extinction-event" seems a bit much. Mainly because there are a _lot_ of unknowns. Better try to get comfortable with just riding this out.
It really isn't. Climate change is going to make large amounts of land unlivable. That's going to cause a climate refugee crisis. I agree the effects of that refugee crisis are unknown, but I can't see any resolution that doesn't involve increased nationalism, civil wars, and violent resource conflicts. Given this is a global crisis, that's a recipe for WW3.
This was all avoidable, of course. But instead of fixing it, we spent decades fiddling around with toys like LLMs. Whee.
I'm trying to parse the idea of "a collective agreement" but can't fully wrap my head around how that would work.
It seems to me more like the lack of a "Walmart Law" is a result of e.g. lack of economies of scale and other economic structure, rather than some collective agreement. (If it was profitable to break out of that agreement and start a "Walmart Law", it seems we'd see that happen pretty quickly?)
But if you know more about this and I'm off the mark I'd love to learn
For me the excitement is that around the o3 announcement I had a feeling like we were heading to an OpenAI / Sam Altman controlled dystopia. This resets that - you can run the model yourself, you can modify it yourself, it's essentially on par with the best public models, and it gives hope that the smaller players have a fighting chance going forward. They also published their innovations bringing back some of the feeling of open science that used to be in ML research but which mostly went away.
Google models are already in the lead in many areas in capability and cost, so I never felt like OpenAI was dominant. OpenAI was first to make a splash, but ChatGPT is in a ~5 way tie in terms of what it can do.
Given what we just saw in terms of the DeepSeek team squeezing a lot of extra performance out of more efficient implementation on GPU, and the model still being optimized for GPU rather than CPU - is it unreasonable to think that in the $6k setup described, some performance might still be left on the table that could be squeezed out with some better optimization for these particular CPUs?
The TLDR is that llama.cpp’s NUMA support is suboptimal, which is hurting performance versus what it should be on this machine. A single socket version likely would perform better until it is fixed. After it is fixed, a dual socket machine would likely run at the same speed as a single socket machine.
If someone implemented a GEMV that scales with NUMA nodes (i.e. PBLAS, but for the data types used in inference), it might be possible to get higher performance from a dual socket machine than we get from a single socket machine.
No, because the bottleneck is RAM bandwidth. This is already quantized and otherwise is essentially random so can't be compressed in any meaningful way.
How much bandwidth do we actually need per-token generation? Let's take one open-source model as a starting point since not all models are created the same.
For non-MoE models, it needs to flow the entire model through the CPU. So if it is a 32B parameter model quantised to 8b/parameter, that is 32GB of RAM bandwidth per token. If your RAM does 64GB/s that is 2 tok/s.
I didn't get the impression that the math around it is that simplistic. The first obvious reason I can think of now is the attention mechanism being used. Both GQA and MQA demand less compute and therefore less bandwidth than MHA.
Many people on the Google AI Developer forums have also noted either bugs or just performance regression in the final model.