For anyone who didn't bother looking deeper, the SWEbench benchmark contains onl...

		paradite on April 2, 2024 \| parent \| context \| favorite \| on: Princeton group open sources "SWE-agent", with 12%... For anyone who didn't bother looking deeper, the SWEbench benchmark contains only Python code projects, so it is not representative of all the programing languages and frameworks. I'm working on a more general SWE task eval framework in JS for arbitrary language and framework now (for starter JS/TS, SQL and Python), for my own prompt engineering product. Hit me up if you are interested.

Assuming the data set is proprietary, else please share the repo