Refuel LLM (84.2%) outperforms trained human annotators (80.4%), GPT-3-5-turbo (81.3%), PaLM-2 (82.3%) and Claude (79.3%) across a benchmark of 15 text labeling datasets. It is a Llama-v2-13b base model, trained on over 2500 unique datasets (5.24B tokens) spanning categories such as classification, entity resolution, matching, reading comprehension and information extraction.
Here is the interactive demo: https://labs.refuel.ai/playground. Pretty fun to play with!