More

reportgunner · 2025-04-07T07:59:18 1744012758

Article makes it seem like finding diamonds is some kind of super complicated logical puzzle. In reality the hardest part is knowing where to look for them and what tool you need to mine them without losing them once you find them. This was given to the AI by having it watch a video that explains it.

If you watch a guide on how to find diamonds it's really just a matter of getting an iron pickaxe, digging to the right depth and strip mining until you find some.

danijar · 2025-04-07T09:20:48 1744017648

Hi, author here! Dreamer learns to find diamonds from scratch by interacting with the environment, without access to external data. So there are no explainer videos or internet text here.

It gets a sparse reward of +1 for each of the 12 items that lead to the diamond, so there is a lot it needs to discover by itself. Fig. 5 in the paper shows the progression: https://www.nature.com/articles/s41586-025-08744-2

itchyjunk · 2025-04-07T10:47:55 1744022875

Since diamonds are surrounded by danger and if it dies, it loses its items and such, why would it not be satisfied after discovering iron pick axe or somesuch? Is it in a mode where it doesn't lose its item when it dies? Does it die a lot? Does it ever try digging vertically down? Does it ever discover other items/tools you didn't expect it to? Open world with sparse reward seems like such a hard problem. Also, once it gets the item, does it stop getting reward for it? I assume so. Surprised that it can work with this level of sparse rewards.

taneq · 2025-04-07T12:21:19 1744028479

In all reinforcement learning there is (explicitly as part of a fitness function, or implicitly as part of the algorithm) some impetus for exploration. It might be adding a tiny reward per square walked, a small reward for each block broken and a larger one for each new block type broken. Or it could be just forcing a random move every N steps so the agent encounters new situations through “clumsiness”.

kevindamm · 2025-04-07T20:15:16 1744056916

That is right, there is usually a parameter on the action selection function -- the exploitation vs exploration balance.

danijar · 2025-04-07T22:26:24 1744064784

When it dies it loses all items and the world resets to a new random seed. It learns to stay alive quite well but sometimes falls into lava or gets killed by monsters.

It only gets a +1 for the first iron pickaxe it makes in each world (same for all other items), so it can't hack rewards by repeating a milestone.

Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.

SpaceManNabs · 2025-04-08T02:15:02 1744078502

> Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.

This is such gold. Thanks for sharing. Immediately added to my notes.

SpaceManNabs · 2025-04-07T17:05:10 1744045510

I just want to express my condolences in how difficult it must be to correct basic misunderstandings that can be immediately corrected from reading the fourth paragraph under the section "Diamonds are forever"

Thanks for your hard work.

danijar · 2025-04-07T22:27:53 1744064873

Haha thanks!

ryan-duve · 2025-04-07T22:38:51 1744065531

For the curious, from the link above:

> log, plank, stick, crafting table, wooden pickaxe, cobblestone, stone pickaxe, iron ore, furnace, iron ingot, iron pickaxe and diamond

kuu · 2025-04-07T08:04:28 1744013068

While I agree with your comment, this sentence:

"This was given to the AI by having it watch a video that explains it."

This was not as trivial as it may seem just a few months ago...

rcxdude · 2025-04-07T08:47:50 1744015670

EDIT: Incorrect, see below

it didn't watch 'a video', it watched many, many hours of video of playing minecraft (with another specialised model feeding in predictions of keyboard and mouse inputs from the video). It's still a neat trick, but it's far from the implied one-shot learning.

danielbln · 2025-04-07T09:34:24 1744018464

The author replied in this thread and says the opposite.

rcxdude · 2025-04-07T09:58:12 1744019892

Ah, I was incorrect. I got that impression from one of the papers linked at the end of the article, but I suspect that's actually some previous work.

SpaceManNabs · 2025-04-08T02:23:33 1744079013

I applaud you for acknowledging your mistake. So many people double down, especially in this pernicious and polarized age.

NVHacker · 2025-04-07T08:29:06 1744014546

Alpha Star was also trained initially from youtube videos of pros playing Starcraft. I would argue that it was pretty trivial a few years ago.

rcxdude · 2025-04-07T08:43:31 1744015411

I don't think it was videos. Almost certainly it was replay files with a bunch of work to transform them into something that could be compared to the model's outputs. (Alphastar never 'sees' the game's interface, only a transformed version of information available via an API)

stingraycharles · 2025-04-07T09:02:23 1744016543

This was my understanding as well, as the replay files are all available anyway.

The YouTube documentary is actually very detailed about how they implemented everything.

SpaceManNabs · 2025-04-08T02:26:40 1744079200

Which documentary? Is it this one?

https://www.youtube.com/watch?v=UuhECwm31dM

stingraycharles · 2025-04-11T11:40:50 1744371650

It was a ~1h documentary

ismailmaj · 2025-04-07T08:38:20 1744015100

Do you know if it was actual videos or some simpler inputs like game state and user inputs? I’d be impressed if it was the former at that time.

johnny22 · 2025-04-07T10:47:05 1744022825

starcraft provides replay files that start with the initial game state and then every action in the game. Not user inputs, but the actions bound to them.

skwirl · 2025-04-07T09:08:33 1744016913

>This was given to the AI by having it watch a video that explains it.

That is not what the article says. It says that was separate, previous research.

Bluglionio · 2025-04-07T08:10:42 1744013442

I don't get it. How can you reduce this achievement down to this?

Have you gotten used to some ai watching a video and 'getting it' so fast that this is boring? Unimpressive?

jerf · 2025-04-07T14:15:03 1744035303

The other replies have observed that the AI didn't get any "videos to watch" but I'd also observe that this is being used as an English colloquialism. The AIs aren't "watching videos", they're receiving videos as their training data. That's quite different from what is coming to your mind as "watching a video" as if the AI watched a single YouTube tutorial video once and got the concept.

reportgunner · 2025-04-07T10:43:00 1744022580

I feel like you are jumping to conclusions here, I wasn't talking about the achievement or the AI, I was talking about the article and the way it explains finding diamonds in minecraft to people who don't know how to find diamonds in minecraft.

rowanG077 · 2025-04-07T08:25:35 1744014335

The AI is able to learn from video and you don't find that even a little bit impressive? Well I disagree.

reportgunner · 2025-04-07T10:43:52 1744022632

see [0]

[0] https://news.ycombinator.com/item?id=43609826

reportgunner · 2025-04-07T07:53:48 1744012428

Financial collapse ? Surely we can just roll out AI powered money printers and make them go BRRR /s

reportgunner · 2025-04-07T07:27:56 1744010876

I am not american nor I live in america so I don't really have a horse in this race, but the DOGE approach seems to be the classic "move fast and break things" approach. The reactions to it are the classic reactions to that approach, competent people speak out to get broken things fixed and others are confused about what is happening.

reportgunner · 2025-04-07T06:57:04 1744009024

Something is really wrong with you if you think of a targeted ad instead of a doctor when you have a medical problem.

throwaway657656 · 2025-04-07T16:09:13 1744042153

I did talk to a doctor. He quoted me $12,000 for a surgery that sounded excessive and had a long recovery time. I try to get second opinions, but doctors are so busy that I never get a call back or schedule many months out.

Oddly the wealthier I get the more I distrust doctors. Why perform a $300 tooth filling, example, when you can creatively justify a $5000 root canal and crown. They know I have the money and their kids private school ain't cheap.

reportgunner · 2025-04-07T06:41:39 1744008099

nostalgia

reportgunner · 2025-03-24T13:56:33 1742824593

Before "Information Systems" there was no need for a company/college wide network; computers with printers on the same desk were a replacement for the typewriter.

reportgunner · 2025-03-24T13:04:33 1742821473

It's about finding a simple (enough) way through a (seemingly) complicated system.

keepamovin · 2025-03-25T03:10:37 1742872237

YES! Clearly an experienced developer in the house here. Your effortless focus on the key thing reveals it. You cannot hide!

reportgunner · 2025-03-24T12:22:38 1742818958

I don't share my data with anyone so I don't need cloud storage. SSDs are so cheap at this point that I just buy a new SSD every few years and install a fresh copy of Windows there. Old drive gets unplugged and "archived" on the shelf.

If I need to access the "archive" I use a SATA to USB cable and plug the drive in.

I avoid large capacity drives (my biggest one is about 200GB) and use spacesniffer so I'm forced to keep the data I store lean.

reportgunner · 2025-03-24T09:50:27 1742809827

That's the easy part. The hard part is getting people to admit that the metric has been discovered already. Most problems with automation are organizational problems not technical ones.

reportgunner · 2025-03-24T09:49:19 1742809759

Well yeah 10% of a big number is always going to be a big number, that's how math works.

You can take all the salaries of the world and assume you can automate a percentage of that and you will consistently arrive to a big number.