Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there a human-play benchmark (even informally) for this style of interface? Not saying it's necessary or even relevant, I'm just curious to know what programmatic Factorio feels like -- I imagine spatial reasoning around text prompts would be fairly challenging for human players to navigate as well.


Human benchmarks for Factorio are speed runners — rushing to launch the first rocket. The current record is just over 4 hours for one player, and 90 minutes for a team. You can see just from that that a multi-tasking LLM has room to outperform humans.


The current 4h12m hour record is for 100% (where you have to get every single achievement in the game, in the one run), any% (where you just need to launch a rocket) is under 2 hours (1h42 for the latest factorio v2.x, 1h18 for v1.x). There are a few other differences between the categories regarding map selection and blueprint use as well.

Records and specific rules for all categories can be found at https://www.speedrun.com/factorio


I think he is talking about a human using the programatic API the LLMs are using to play the game. I think that would be a whole lot slower than normal playthrough


We were able to pass all the early lab tasks manually - although it took a lot longer than using the UI!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: