Is there a human-play benchmark (even informally) for this style of interface? N...

sonofhans · 2025-03-11T19:35:38 1741721738

Human benchmarks for Factorio are speed runners — rushing to launch the first rocket. The current record is just over 4 hours for one player, and 90 minutes for a team. You can see just from that that a multi-tasking LLM has room to outperform humans.

janzer · 2025-03-11T23:01:38 1741734098

The current 4h12m hour record is for 100% (where you have to get every single achievement in the game, in the one run), any% (where you just need to launch a rocket) is under 2 hours (1h42 for the latest factorio v2.x, 1h18 for v1.x). There are a few other differences between the categories regarding map selection and blueprint use as well.

Records and specific rules for all categories can be found at https://www.speedrun.com/factorio

goriv · 2025-03-11T22:46:06 1741733166

I think he is talking about a human using the programatic API the LLMs are using to play the game. I think that would be a whole lot slower than normal playthrough

noddybear · 2025-03-12T11:57:51 1741780671

We were able to pass all the early lab tasks manually - although it took a lot longer than using the UI!