Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: I implemented evals metrics for LLMs that runs locally on your machine (github.com/confident-ai)
22 points by 3d27 on Dec 11, 2023 | hide | past | favorite | 3 comments



Oh cool - we just did a writeup on Ragas so will check this one out!


The idea around a testing framework for LLMs is nice. Can you provide some examples where this can be used?


The package I built is like a provider for 10+ different evaluation metrics that run both locally on your machine using models from hugging-face but also on the cloud IF you want more functionality.

If you want to evaluate a fine-tuned model, we have integrations with LM Harness and Stanford HELM coming out. If you want to evaluate a RAG application, we have 7+ metrics available for that.

You can also create your custom metrics using our interface!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: