great looking ui and implementation, and kudos on the launch. how do you handle cases where predefined scoring metrics don’t fully capture what ‘good’ means for a specific use case, like ranking legal documents or detecting nuanced sentiment in customer reviews?
hey will, the metrics in the scoring system are not predefined but rather generated for a use case by breaking down a subjective set of conditions into a tree of metrics that combine various objective metrics into a subjective aggregate. Here is a prefilled playground with dimensions for the sentiment analysis in customer reviews use case. You can see how it breaks down and if you put a review and click Run you should see the scores combine from individual custom dimensions to the aggregate score. Hope this helps!