More

tomrod · 2025-02-01T14:32:22 1738420342

As articles are increasingly published with AI, this becomes yet another example of, to coin a phrase, AI loopidity.

inciampati · 2025-02-01T14:40:09 1738420809

This is so good! Finally the word I need to describe what I'm living. And yes I'm copy pasting articles into LLMs for summary.

IOUnix · 2025-02-01T14:46:40 1738421200

If writers are using ai to write articles, and readers are using ai to summarize the same article, what's the underlying inefficiency here? That writers should just be posting brief summaries of the articles in the first place? Or maybe they just need to be prompting ai to create a summary rather than a full article?

cle · 2025-02-01T14:58:31 1738421911

Differences in how people want to consume the article, and what information they’re looking for. Some want snippets, some want summaries, some want long form.

We have information compression machines now. Might as well raw dump the information and let the machine package it up in the format we prefer for consumption, instead of pre-packaging it. (Yeah, this is effectively what authors are doing…currently they can still do novel things that the compression machines can’t, but how long will that last?)

tomrod · 2025-02-01T00:32:47 1738369967

Yeah, but that's one tiny area where Musk and Trump have caused recent chaos.

Just today, social security servers, websites, etc. Treasury department systems.

tomrod · 2025-01-31T12:11:41 1738325501

https://archive.ph/oYYRT#selection-545.0-545.76

tomrod · 2025-01-31T01:36:42 1738287402

They seem to be missing a pricing page (or pricing information of any kind) as well as how to actually purchase the product.

tomrod · 2025-01-29T21:17:58 1738185478

Likewise. And your comment reminded me of real programmers*

* https://xkcd.com/378/

tomrod · 2025-01-28T18:23:11 1738088591

So far. And there might be orders of magnitude left to improve! Deepseek R1 is a training architecture improvement -- cool stuff! [0]

[0] https://newsletter.languagemodels.co/p/the-illustrated-deeps...

tomrod · 2025-01-28T16:57:35 1738083455

Yeah, the younguns smell opportunity and run towards it. They'll be fine. It's younguns) the less experienced folks in the current corporate world that will have the most to lose.

CalChris · 2025-01-28T17:54:41 1738086881

Or perhaps it will be the more experienced knuckle draggers, hardened in our ways.

AnotherGoodName · 2025-01-28T19:50:31 1738093831

The really experienced of us will have made this mistake enough times to know to avoid it.

I didn’t get a smart phone until the 2010s. Stupid I know but it was seen as a badge of honour in some circles ‘bah I don’t even use a smart phone’ we’d say as the young crowd went about their lives never getting lost without a map and generally having an easier time of it since they didn’t have that mental block.

Ai is going to be similar no doubt. I’m already seeing ‘bah I don’t use ai coding assistants’ type of posts, wearing it as a badge of honour. ‘Ok you’re making things harder for yourself’ should be the reply but we’ll no doubt have people wearing it as a badge of honour for some time yet.

tomrod · 2025-01-27T23:36:27 1738020987

Surprising and refreshing.

Create an ecosystem and all tides rise.

tomrod · 2025-01-27T23:30:58 1738020658

Llama.vscode and llama.vim look promising.

tomrod · 2025-01-27T14:46:47 1737989207

Conversely, how much larger can you scale if frontier models only currently need 3 consumer computers?

Imagine having 300. Could you build even better models? Is DeepSeek the right team to deliver that, or can OpenAI, Meta, HF, etc. adapt?

Going to be an interesting few months on the market. I think OpenAI lost a LOT in the board fiasco. I am bullish on HF. I anticipate Meta will lose folks to brain drain in response to management equivocation around company values. I don't put much stock into Google or Microsoft's AI capabilities, they are the new IBMs and are no longer innovating except at obvious margins.

stormfather · 2025-01-27T14:58:07 1737989887

Google is silently catching up fast with Gemini. They're also pursuing next gen architectures like Titan. But most importantly, the frontier of AI capabilities is shifting towards using RL at inference (thinking) time to perform tasks. Who has more data than Google there? They have a gargantuan database of queries paired with subsequent web nav, actions, follow up queries etc. Nobody can recreate this, Bing failed to get enough marketshare. Also, when you think of RL talent, which company comes to mind? I think Google has everyone checkmated already.

shwaj · 2025-01-27T18:00:59 1738000859

Can you say more about using RL at inference time, ideally with a pointer to read more about it? This doesn’t fit into my mental model, in a couple of ways. The main way is right in the name: “learning” isn’t something that happens at inference time; inference is generating results from already-trained models. Perhaps you’re conflating RL with multistage (e.g. “chain of thought”) inference? Or maybe you’re talking about feeding the result of inference-time interactions with the user back into subsequent rounds of training? I’m curious to hear more.

stormfather · 2025-01-27T19:43:14 1738006994

I wasn't clear. Model weights aren't changing at inference time. I meant at inference time the model will output a sequence of thoughts and actions to perform tasks given to it by the user. For instance, to answer a question it will search the web, navigate through some sites, scroll, summarize, etc. You can model this as a game played by emitting a sequence of actions in a browser. RL is the technique you want to train this component. To scale this up you need to have a massive amount of examples of sequences of actions taken in the browser, the outcome it led to, and a label for if that outcome was desirable or not. I am saying that by recording users googling stuff and emailing each other for decades Google has this massive dataset to train their RL powered browser using agent. Deepseek proving that simple RL ca be cheaply applied to a frontier LLM and have reasoning organically emerge makes this approach more obviously viable.

shwaj · 2025-01-28T07:02:47 1738047767

Makes sense, thanks. I wonder whether human web-browsing strategies are optimal for use in a LLM, e.g. given how much faster LLMs are at reading the webpages they find, compared to humans? Regardless, it does seem likely that Google’s dataset is good for something.

_DeadFred_ · 2025-01-27T17:58:26 1738000706

How quickly the narrative went from 'Google silently has the most advanced AI but they are afraid to release it' to 'Google is silently catching up' all using the same 'core Google competencies' to infer Google's position of strength. Wonder what the next lower level of Google silently leveraging their strength will be?

stormfather · 2025-01-27T19:45:27 1738007127

Google is clearly catching up. Have you tried the recent Gemini models? Have you tried deep research? Google is like a ship that is hard to turn around but also hard to stop once in motion.

moffkalast · 2025-01-27T17:52:44 1738000364

Never underestimate Google's ability to fall flat on their face when it comes to shipping products.

onlyrealcuzzo · 2025-01-27T16:19:32 1737994772

If you watch this video, it explains well what the major difference is between DeepSeek and existing LLMs: https://www.youtube.com/watch?v=DCqqCLlsIBU

It seems like there is MUCH to gain by migrating to this approach - and it theoretically should not cost more to switch to that approach than vs the rewards to reap.

I expect all the major players are already working full-steam to incorporate this into their stacks as quickly as possible.

IMO, this seems incredibly bad to Nvidia, and incredibly good to everyone else.

I don't think this seems particularly bad for ChatGPT. They've built a strong brand. This should just help them reduce - by far - one of their largest expenses.

They'll have a slight disadvantage to say Google - who can much more easily switch from GPU to CPU. ChatGPT could have some growing pains there. Google would not.

wolfhumble · 2025-01-27T17:01:38 1737997298

> I don't think this seems particularly bad for ChatGPT. They've built a strong brand. This should just help them reduce - by far - one of their largest expenses.

Often expenses like that are keeping your competitors away.

onlyrealcuzzo · 2025-01-27T17:20:32 1737998432

Yes, but it typically doesn't matter if someone can reach parity or even surpass you - they have to surpass you by a step function to take a significant number of your users.

This is a step function in terms of efficiency (which presumably will be incorporated into ChatGPT within months), but not in terms of end user experience. It's only slightly better there.

ReptileMan · 2025-01-27T17:41:22 1737999682

One data point but my subscription for chatgpt is cancelled every time. So I made every month decision to resub. And because the cost of switching is essentially zero - the moment a better service is up there I will switch in an instant.

onlyrealcuzzo · 2025-01-27T18:28:09 1738002489

There are obviously people like you, but I hope you realize this is not the typical user.

tomrod · 2025-01-27T20:34:03 1738010043

That is a fantastic video, BTW.

danaris · 2025-01-27T14:55:21 1737989721

This assumes no (or very small) diminishing returns effect.

I don't pretend to know much about the minutiae of LLM training, but it wouldn't surprise me at all if throwing massively more GPUs at this particular training paradigm only produces marginal increases in output quality.

tomrod · 2025-01-27T15:30:22 1737991822

I believe the margin to expand is on CoT, where tokens can grow dramatically. If there is value in putting more compute towards it, there may still be returns to be captured on that margin.

simpaticoder · 2025-01-27T18:46:39 1738003599

>Imagine having 300.

Would it not be useful to have multiple independent AIs observing and interacting to build a model of the world? I'm thinking something roughly like the "councelors" in the Civilization games, giving defense/economic/cultural advice, but generalized over any goal-oriented scenario (and including one to take the "user" role). A group of AIs with specific roles interacting with each other seems like a good area to explore, especially now given the downward scalability of LLMs.

JoshTko · 2025-01-27T20:11:44 1738008704

This is exactly where Deepseeks enhancements come into play. Essentially deepseek lets the model think out loud via chain of thought (o1 and Claude also do this) but DS also does not supervise the chain of thought, and simply rewards CoT that get the answer correctly. This is just one of the half dozen training optimization that Deepseek has come up with.

tomrod · 2025-01-27T19:00:36 1738004436

Yes; to my understanding that is MoE.