New Feature: We are integrating evaluations powered by DeepEval from Confident AI within the claim normalization pipeline to evaluate the quality of extracted claims.
Overview
The evaluation system automatically:- Analyzes claim extraction quality using industry-standard metrics
- Generates detailed evaluation reports in CSV format for local analysis
- Provides cloud dashboard access for advanced trace analysis
- Requires no custom code - evaluations run automatically in the pipeline
- Integrates with Confident AI for comprehensive evaluation tracking
DeepEval Framework
Powered by DeepEval’s comprehensive evaluation framework
Confident AI Platform
Access detailed traces and test runs on Confident AI’s dashboard
Getting Started
API Key Setup
To access cloud dashboard features and detailed traces, you’ll need a Confident AI API key:1
Get your Confident AI API Key
Visit confident-ai.com and create an account to obtain your API key.
2
Set your environment variable
3
Include in API requests
Pass your Confident AI API key along with your CheckThat AI requests to enable dashboard tracking.
Basic Usage
Evaluations run automatically when you use CheckThat AI’s claim normalization features:Evaluation Metrics
The evaluation system assesses multiple dimensions of claim extraction quality:Accuracy Metrics
Accuracy Metrics
Claim Identification Accuracy
- Precision: How many extracted claims are actually valid claims
- Recall: How many valid claims were successfully identified
- F1-Score: Harmonic mean of precision and recall
- Semantic similarity between original text and extracted claims
- Factual consistency of extracted information
- Preservation of original meaning and context
Quality Metrics
Quality Metrics
Completeness
- Coverage of all relevant claims in the source text
- Identification of implicit vs. explicit claims
- Handling of compound and nested claims
- Logical consistency of extracted claims
- Proper claim boundaries and segmentation
- Maintenance of causal relationships
Confidence Scoring
Confidence Scoring
Extraction Confidence
- Model confidence in claim identification
- Uncertainty quantification for ambiguous cases
- Reliability scores for different claim types
- Assessement of how well-formed claims are for fact-checking
- Identification of claims requiring additional context
- Flagging of unverifiable or opinion-based statements
CSV Evaluation Reports
Receive detailed evaluation reports that you can save locally for analysis:Report Structure
Sample Evaluation Report
Downloading Reports
Confident AI Dashboard Access
Access detailed traces and advanced analytics through the Confident AI cloud dashboard:Dashboard Features
Test Run Tracking
View detailed test runs with input/output traces and performance metrics
Evaluation Analytics
Comprehensive analytics with visualizations and trend analysis
Model Comparison
Compare performance across different models and configurations
Custom Metrics
Configure custom evaluation metrics for specific use cases
Accessing the Dashboard
1
Ensure API Key is Set
Make sure your
CONFIDENT_API_KEY is included in your requests:2
Visit Confident AI Dashboard
Go to confident-ai.com and log in with your account.
3
View Your Test Runs
Navigate to your test runs to see detailed traces of your claim evaluation processes.
4
Analyze Performance
Use the dashboard’s analytics tools to identify patterns and optimize your claim extraction workflows.
Dashboard Screenshots

Confident AI dashboard showing evaluation metrics and traces

Detailed test run view with claim extraction analysis
Integration Examples
Batch Evaluation Workflow
Best Practices
Evaluation Strategy
Evaluation Strategy
Regular Quality Monitoring
- Set up automated evaluation for production workflows
- Monitor evaluation trends over time
- Set quality thresholds and alerts for low-performance cases
- Choose evaluation metrics that align with your use case
- Balance accuracy, completeness, and processing speed
- Consider domain-specific evaluation criteria
Performance Optimization
Performance Optimization
Batch Processing
- Process multiple claims together for better efficiency
- Use batch APIs when available for large-scale evaluation
- Implement proper rate limiting and error handling
- Monitor evaluation costs alongside regular API usage
- Use sampling for large datasets to control evaluation costs
- Balance evaluation frequency with budget constraints
Data Analysis
Data Analysis
Report Analysis
- Regularly review CSV reports for quality trends
- Identify patterns in low-quality extractions
- Use insights to improve prompt engineering
- Leverage Confident AI dashboard for deep analysis
- Set up custom metrics for your specific domain
- Use trace analysis to debug extraction issues
FAQ
Do I need to write any code for evaluations?
Do I need to write any code for evaluations?
No! Evaluations run automatically within the claim normalization pipeline. You just need to include your
CONFIDENT_API_KEY to access dashboard features and detailed traces.How do I get CSV evaluation reports?
How do I get CSV evaluation reports?
CSV reports are generated automatically and can be downloaded via the API or accessed through the Confident AI dashboard. Reports include detailed metrics for each extracted claim.
What if I don't have a Confident AI API key?
What if I don't have a Confident AI API key?
Basic evaluations will still run automatically. However, you’ll need a
CONFIDENT_API_KEY to access the cloud dashboard, detailed traces, and advanced analytics features.How often should I review evaluation reports?
How often should I review evaluation reports?
For production systems, we recommend daily or weekly review of evaluation metrics. For development and testing, review after each significant change to your claim extraction workflow.
Getting Help: Visit confident-ai.com for detailed documentation on the Confident AI platform, or deepeval.com for information about the evaluation framework.

