Sounds like a good use of "spare" time to me and not that different from many a lab I've been part of: someone gets a hunch, sets up an experiment to follow it, proves poor disproves whatever they were after, pulls down the experiment, rinse, repeat.
"Due to the strict new guidelines of the EU AI Act that take effect on August 2nd 2025, we recommend that each R1T/R1T2 user in the EU either familiarizes themselves with these requirements and assess their compliance, or ceases using the model in the EU after August 1st, 2025."
Doesn't the deepseek licence completely forbid any use in the EU already? How can a german company legally build this in the first place (which they presumably did)?
While I agree the word could be appropriate, I'm asking a meta question about how it is typically used, and whether or not we're conveying something unintentional by using it in this context as well. I don't consider "variants" a good thing because I lived through a few years of COVID.
Yes. If you look at the diagram that plots the performance vs the amount of output tokens, you can see that R1T2 uses about 1/3 of the output tokens that R1-0528 uses.
Keep in mind, the speed improvement doesn’t come from the model running any faster (it’s the exact same architecture as R1, after all) but from using less output tokens while still achieving very good results.
Fair point. More benchmarks are definitely good but I’m optimistic that they will show similar results.
Anecdotally, I can say that my personal experience with the model is in line with what the benchmarks claim: It’s a bit smarter than R1, a bit faster than R1, much faster than R1-0528, but not quite as smart. (Faster meaning less output tokens). For me, it’s at a sweet spot and I use it as daily driver.
It is always about the trade-off between those two parameters.
Of course an increase in both is the optimal, but a small sacrifice in performance/accuracy for being 200% faster is worth noting. Around 10% drop in accuracy for 200% speed-up, some would take it!
Also that “speed up” is actually hiding “less compute used” which is a proxy for cost. Assuming this is 200% faster purely because it needs less compute, that should mean it costs roughly 1/3 as much to run for a 10% decrease in quality of output.
reply