I was wondering why Figure 1 showed a HumanEval score of 61.6 for Qwen2.5-Coder-...

johndough 3 months ago | parent | context | favorite | on: OpenCoder: Open Cookbook for Top-Tier Code Large L...

I was wondering why Figure 1 showed a HumanEval score of 61.6 for Qwen2.5-Coder-7B, but Table 1 shows a score of 88.4, i. e. better than this new model with a score of 66.5.

The reason is that those are actually two different models (Qwen2.5-Coder-7B-Base with 61.6, Qwen2.5-Coder-7B-Instruct with 88.4).