Hacker Newsnew | past | comments | ask | show | jobs | submit | reportgunner's commentslogin

Article makes it seem like finding diamonds is some kind of super complicated logical puzzle. In reality the hardest part is knowing where to look for them and what tool you need to mine them without losing them once you find them. This was given to the AI by having it watch a video that explains it.

If you watch a guide on how to find diamonds it's really just a matter of getting an iron pickaxe, digging to the right depth and strip mining until you find some.


Hi, author here! Dreamer learns to find diamonds from scratch by interacting with the environment, without access to external data. So there are no explainer videos or internet text here.

It gets a sparse reward of +1 for each of the 12 items that lead to the diamond, so there is a lot it needs to discover by itself. Fig. 5 in the paper shows the progression: https://www.nature.com/articles/s41586-025-08744-2


Since diamonds are surrounded by danger and if it dies, it loses its items and such, why would it not be satisfied after discovering iron pick axe or somesuch? Is it in a mode where it doesn't lose its item when it dies? Does it die a lot? Does it ever try digging vertically down? Does it ever discover other items/tools you didn't expect it to? Open world with sparse reward seems like such a hard problem. Also, once it gets the item, does it stop getting reward for it? I assume so. Surprised that it can work with this level of sparse rewards.


In all reinforcement learning there is (explicitly as part of a fitness function, or implicitly as part of the algorithm) some impetus for exploration. It might be adding a tiny reward per square walked, a small reward for each block broken and a larger one for each new block type broken. Or it could be just forcing a random move every N steps so the agent encounters new situations through “clumsiness”.


That is right, there is usually a parameter on the action selection function -- the exploitation vs exploration balance.


When it dies it loses all items and the world resets to a new random seed. It learns to stay alive quite well but sometimes falls into lava or gets killed by monsters.

It only gets a +1 for the first iron pickaxe it makes in each world (same for all other items), so it can't hack rewards by repeating a milestone.

Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.


> Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.

This is such gold. Thanks for sharing. Immediately added to my notes.


I just want to express my condolences in how difficult it must be to correct basic misunderstandings that can be immediately corrected from reading the fourth paragraph under the section "Diamonds are forever"

Thanks for your hard work.


Haha thanks!


For the curious, from the link above:

> log, plank, stick, crafting table, wooden pickaxe, cobblestone, stone pickaxe, iron ore, furnace, iron ingot, iron pickaxe and diamond


While I agree with your comment, this sentence:

"This was given to the AI by having it watch a video that explains it."

This was not as trivial as it may seem just a few months ago...


EDIT: Incorrect, see below

it didn't watch 'a video', it watched many, many hours of video of playing minecraft (with another specialised model feeding in predictions of keyboard and mouse inputs from the video). It's still a neat trick, but it's far from the implied one-shot learning.


The author replied in this thread and says the opposite.


Ah, I was incorrect. I got that impression from one of the papers linked at the end of the article, but I suspect that's actually some previous work.


I applaud you for acknowledging your mistake. So many people double down, especially in this pernicious and polarized age.


Alpha Star was also trained initially from youtube videos of pros playing Starcraft. I would argue that it was pretty trivial a few years ago.


I don't think it was videos. Almost certainly it was replay files with a bunch of work to transform them into something that could be compared to the model's outputs. (Alphastar never 'sees' the game's interface, only a transformed version of information available via an API)


This was my understanding as well, as the replay files are all available anyway.

The YouTube documentary is actually very detailed about how they implemented everything.


Which documentary? Is it this one?

https://www.youtube.com/watch?v=UuhECwm31dM


It was a ~1h documentary


Do you know if it was actual videos or some simpler inputs like game state and user inputs? I’d be impressed if it was the former at that time.


starcraft provides replay files that start with the initial game state and then every action in the game. Not user inputs, but the actions bound to them.


>This was given to the AI by having it watch a video that explains it.

That is not what the article says. It says that was separate, previous research.


I don't get it. How can you reduce this achievement down to this?

Have you gotten used to some ai watching a video and 'getting it' so fast that this is boring? Unimpressive?


The other replies have observed that the AI didn't get any "videos to watch" but I'd also observe that this is being used as an English colloquialism. The AIs aren't "watching videos", they're receiving videos as their training data. That's quite different from what is coming to your mind as "watching a video" as if the AI watched a single YouTube tutorial video once and got the concept.


I feel like you are jumping to conclusions here, I wasn't talking about the achievement or the AI, I was talking about the article and the way it explains finding diamonds in minecraft to people who don't know how to find diamonds in minecraft.


The AI is able to learn from video and you don't find that even a little bit impressive? Well I disagree.



Financial collapse ? Surely we can just roll out AI powered money printers and make them go BRRR /s


I am not american nor I live in america so I don't really have a horse in this race, but the DOGE approach seems to be the classic "move fast and break things" approach. The reactions to it are the classic reactions to that approach, competent people speak out to get broken things fixed and others are confused about what is happening.


Something is really wrong with you if you think of a targeted ad instead of a doctor when you have a medical problem.


I did talk to a doctor. He quoted me $12,000 for a surgery that sounded excessive and had a long recovery time. I try to get second opinions, but doctors are so busy that I never get a call back or schedule many months out.

Oddly the wealthier I get the more I distrust doctors. Why perform a $300 tooth filling, example, when you can creatively justify a $5000 root canal and crown. They know I have the money and their kids private school ain't cheap.


nostalgia


Before "Information Systems" there was no need for a company/college wide network; computers with printers on the same desk were a replacement for the typewriter.


It's about finding a simple (enough) way through a (seemingly) complicated system.


YES! Clearly an experienced developer in the house here. Your effortless focus on the key thing reveals it. You cannot hide!


I don't share my data with anyone so I don't need cloud storage. SSDs are so cheap at this point that I just buy a new SSD every few years and install a fresh copy of Windows there. Old drive gets unplugged and "archived" on the shelf.

If I need to access the "archive" I use a SATA to USB cable and plug the drive in.

I avoid large capacity drives (my biggest one is about 200GB) and use spacesniffer so I'm forced to keep the data I store lean.


That's the easy part. The hard part is getting people to admit that the metric has been discovered already. Most problems with automation are organizational problems not technical ones.


Well yeah 10% of a big number is always going to be a big number, that's how math works.

You can take all the salaries of the world and assume you can automate a percentage of that and you will consistently arrive to a big number.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: