Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Talc – Custom benchmarking for LLM apps (talc.ai)
1 point by maxrmk on Nov 1, 2023 | hide | past | favorite
Hey HN! We recently launched our tool for testing AI systems. The goal is to make it really easy for teams to maintain benchmarks for things like "factual accuracy in QA". So if you're building a customer support bot, you can test (during development) how often it lies about your products. It's all automatically graded, and shows you only the interesting results.

It's essentially a scaled up version of manually entering a bunch of test cases and seeing how it performs.

If you're interested in LLM testing, evals, or benchmarking, lets chat!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: