Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Do we need MCPs? Reverse-engineered Slack and Linear API for Evals & RL (agentdiff.dev)
11 points by hubertmarek 48 days ago | hide | past | favorite | 4 comments


thanks for sharing, love the transparency sharing test results too. mildly curious - why did you chose Slack & Linear? why not something else?


Hi HN, I noticed it is almost impossible to run evals or train models on 3rd party integrations, so I built interactive environments for them. Feedback is more than welcome. Thanks!

Interesting fact - running evals on 40 tasks for Linear API, most frontier models scored surprisingly well:

- Claude Opus 4.5: 95% (38/40) - GLM 4.6: 87.5% (35/40) - Claude Sonnet 4.5: 85% (34/40) - Claude Haiku 4.5: 82.5% (33/40) - Kimi K2: 82.5% (33/40) - Grok 4.1 Fast: 80% (32/40) - GPT 5.1: 77.5% (31/40)

This makes me think whether we really need to reinvent the wheel and make special interfaces (MCPs) for agents interacting with services, when they can just use APIs as they are.


Super interesting! At my company we have our agent writing code to make API calls and we were looking for a way to evaluate our agent on exactly that! The problem with doing that yourself using the Gmail, Linear or Slack API is that you quickly hit rates limits, but if we have a copy of it, problem solved.

Will definitely try this!


Where rate limits the main blocker for you?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: