Currently it's a text-only modality environment but we are planning to support v...

vessenes · 2025-03-11T15:39:31 1741707571

That makes sense and it’s really interesting - it is a challenging visual test for sure; thousands of entities, either multi tier visual representations (screen, map, overview map) or a GIANT high res image. I hereby propose FLE-V a subset benchmark for visual models where they just turn a factorio image into a proper FLE description. And maybe the overview and map images as well.

kridsdale1 · 2025-03-11T15:54:28 1741708468

Such research could have hundreds of billions of dollars in downstream GDP implications when applied to real industrial settings.

dismalpedigree · 2025-03-11T22:02:02 1741730522

Not to mention the increased productivity of everyone not wasting their time in factorio (myself included) because the optimal solution is known.

lukan · 2025-03-12T05:33:02 1741757582

Not wasted time, you were doing research it seems.

dismalpedigree · 2025-03-13T01:44:34 1741830274

Good point. My wife will surely understand if I explain it as “research”

vessenes · 2025-03-11T16:36:13 1741710973

Well I better get training!

grayhatter · 2025-03-11T16:19:20 1741709960

> As the complexity of the game state grew and the screenshots were filled with more entities, the models got even more confused and started hallucinating directions, entities etc or weren't capable of troubleshooting factories with apparent mistakes (i.e missing transport belt, wrongly rotated inserter). We think it's because [...]

I think you just described a research paper that would advance sota. Less describing why, but how. (Assuming it's not just, wy finetuned the model and it worked perfectly)

martbakler · 2025-03-11T17:19:56 1741713596

Sounds almost like a visual "needle in a haystack" type of work, that could be quite interesting!

pyinstallwoes · 2025-03-12T00:18:09 1741738689

Where’s Waldo test for vlm