jwyang's comments

jwyang · 2025-02-20T08:04:13 1740038653

very good question, now we are mainly focusing on building the foundtion for multimodal perception and atomic action taking. Of course, integrating the trace-of-mark prediction for robotics and human video data enhances the model's medium length reasoning but this is not sufficient for sure. The current Magma model will serve as the basis for our next step, i.e., longer horizong reasoning and planning! We are exactly looking at this part for our next version of Magma!

jwyang · 2025-02-20T06:29:52 1740032992

Thanks for your great interests on our Magma work, everyone!

We will gradually roll out the inference/training/evaluation/data preprocessing code on our codebase: https://github.com/microsoft/Magma, and this will be finished by next Tuesday. Stay tunned!

dr_dshiv · 2025-02-20T12:26:56 1740054416

How far are we from making peanut butter sandwiches? Is that a valid benchmark to look towards, in this space?

jwyang · 2025-02-20T06:26:23 1740032783

Good catch! A minor correction: Magma - M(ultimodal) Ag(entic) M(odel) at M(icrosoft) (Rese)A(rch), the last part is similar to how the name Llama came out, :)

throw310822 · 2025-02-20T07:29:12 1740036552

How many 'M's in "Magma"? ;)

jwyang · 2025-02-20T07:34:30 1740036870

ops, a typo, no M from Microsoft.

manojlds · 2025-02-20T09:06:31 1740042391

It's ok GPT

exe34 · 2025-02-20T10:22:05 1740046925

there are two r.

jwyang · on Nov 9, 2023

Here are other related links:

code: https://github.com/microsoft/SoM

demo: https://user-images.githubusercontent.com/3894247/281586831-...