When generating synthetic data with LLMs (GPT4, Claude, …) or diffusion models (DALLE 3, Stable Diffusion, Midjourney, …), how do you evaluate how good it is?
Introducing: Quality scores to systematically evaluate a synthetic dataset with just one line of code! Use Cleanlab’s synthetic dataset scores to rigorously guide your prompt engineering (much better signal than just manually inspecting samples). These scores also help you tune settings of any synthetic data generator (eg. GAN or probabilistic model hyperparameters) and compare different synthetic data providers.
Cleanlab scores comprehensively evaluate a synthetic dataset for different shortcomings including: unrealistic examples, low diversity, overfitting/memorization of real data, and underrepresentation of certain real scenarios. These scores are universally applicable to image, text, and structured/tabular data!
Introducing: Quality scores to systematically evaluate a synthetic dataset with just one line of code! Use Cleanlab’s synthetic dataset scores to rigorously guide your prompt engineering (much better signal than just manually inspecting samples). These scores also help you tune settings of any synthetic data generator (eg. GAN or probabilistic model hyperparameters) and compare different synthetic data providers.
Cleanlab scores comprehensively evaluate a synthetic dataset for different shortcomings including: unrealistic examples, low diversity, overfitting/memorization of real data, and underrepresentation of certain real scenarios. These scores are universally applicable to image, text, and structured/tabular data!
Check out the blog for more details or the tutorial notebook: https://help.cleanlab.ai/tutorials/synthetic_data/