Structured Outputs

CheckThat AI supports structured outputs that ensure your API responses follow a specific schema. This feature is particularly useful for data extraction, classification tasks, and when you need consistent, parseable output formats.

Structured outputs work with all supported models and can be combined with claim refinement and fact-checking capabilities for verified, structured data extraction.

Overview

Structured outputs allow you to:

Ensure consistent response formats across all API calls with type safety
Extract specific data fields from unstructured text reliably
Validate responses against predefined Pydantic models or JSON schemas
Simplify parsing by receiving guaranteed, type-safe structures
Combine with claim refinement for iteratively improved accuracy
Integrate evaluation metrics for quality assurance

Using Pydantic Models (Recommended)

The recommended approach is using the parse() method with Pydantic models for full type safety and validation:

from checkthat_ai import CheckThatAI
from pydantic import BaseModel, Field
from typing import List, Literal

class ClaimExtraction(BaseModel):
    claim: str = Field(description="The main claim extracted from the text")
    category: Literal["health", "politics", "science", "technology", "other"] = Field(
        description="Category of the claim"
    )
    confidence: float = Field(
        ge=0.0, le=1.0,
        description="Confidence level in claim extraction"
    )
    evidence_required: bool = Field(
        description="Whether the claim requires fact-checking"
    )

client = CheckThatAI(api_key="your-api-key")

response = client.chat.completions.parse(
    model="gpt-5-2025-08-07",  # Use latest models
    messages=[{
        "role": "user",
        "content": "Extract the main claim: 'Studies show that drinking green tea daily can reduce cancer risk by 30%.'"
    }],
    response_format=ClaimExtraction
)

# Access typed, validated response
parsed = response.choices[0].message.parsed
print(f"Claim: {parsed.claim}")
print(f"Category: {parsed.category}")
print(f"Confidence: {parsed.confidence}")
print(f"Evidence Required: {parsed.evidence_required}")

Alternative: JSON Schema Format

You can also use JSON schema format directly with the regular create() method:

from checkthat_ai import CheckThatAI
import json

client = CheckThatAI(api_key="your-api-key")

# Define JSON schema for the response
schema = {
    "type": "object",
    "properties": {
        "claim": {
            "type": "string",
            "description": "The main claim extracted from the text"
        },
        "category": {
            "type": "string",
            "enum": ["health", "politics", "science", "technology", "other"],
            "description": "Category of the claim"
        },
        "confidence": {
            "type": "number",
            "minimum": 0,
            "maximum": 1,
            "description": "Confidence level in claim extraction"
        },
        "evidence_required": {
            "type": "boolean",
            "description": "Whether the claim requires fact-checking"
        }
    },
    "required": ["claim", "category", "confidence", "evidence_required"]
}

response = client.chat.completions.create(
    model="gpt-5-2025-08-07",  # Use latest models
    messages=[
        {"role": "user", "content": "Extract the main claim from this text: 'Studies show that drinking green tea daily can reduce cancer risk by 30%.'"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "claim_extraction",
            "schema": schema
        }
    }
)

# Response will be valid JSON matching the schema
parsed_response = json.loads(response.choices[0].message.content)
print(f"Claim: {parsed_response['claim']}")
print(f"Category: {parsed_response['category']}")

Response Format

Structured Output Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "gpt-5-2025-08-07",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"claim\": \"Drinking green tea daily can reduce cancer risk by 30%\", \"category\": \"health\", \"confidence\": 0.85, \"evidence_required\": true}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 32,
    "total_tokens": 77
  }
}

Advanced Examples

Data Extraction from Text

Extract multiple data points from complex text:

from checkthat_ai import CheckThatAI
import json

client = CheckThatAI(api_key="your-api-key")

# Schema for extracting article metadata
article_schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "author": {"type": "string"},
        "publication_date": {"type": "string", "format": "date"},
        "claims": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "statement": {"type": "string"},
                    "type": {
                        "type": "string",
                        "enum": ["fact", "opinion", "prediction", "statistic"]
                    },
                    "verifiable": {"type": "boolean"}
                },
                "required": ["statement", "type", "verifiable"]
            }
        },
        "overall_credibility": {
            "type": "string",
            "enum": ["high", "medium", "low", "unknown"]
        }
    },
    "required": ["title", "claims", "overall_credibility"]
}

article_text = """
Title: "The Future of Renewable Energy"
By Dr. Sarah Johnson, Published March 15, 2024

Solar power efficiency has increased by 40% in the last decade. 
I believe this trend will continue. By 2030, renewable energy 
will likely comprise 60% of global electricity generation.
"""

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",  # Use latest models
    messages=[
        {"role": "user", "content": f"Extract structured information from this article: {article_text}"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "article_analysis",
            "schema": article_schema
        }
    }
)

result = json.loads(response.choices[0].message.content)
for claim in result['claims']:
    print(f"Claim: {claim['statement']}")
    print(f"Type: {claim['type']}")
    print(f"Verifiable: {claim['verifiable']}")
    print("---")

Fact-Checking with Structured Output

Combine CheckThat AI’s fact-checking capabilities with structured outputs:

fact_check_schema = {
    "type": "object",
    "properties": {
        "claims": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "claim_text": {"type": "string"},
                    "verdict": {
                        "type": "string",
                        "enum": ["true", "false", "partially_true", "unverified", "misleading"]
                    },
                    "confidence": {
                        "type": "number",
                        "minimum": 0,
                        "maximum": 1
                    },
                    "evidence_summary": {"type": "string"},
                    "sources": {
                        "type": "array",
                        "items": {"type": "string"},
                        "description": "Authoritative sources consulted"
                    },
                    "context_needed": {"type": "boolean"}
                },
                "required": ["claim_text", "verdict", "confidence", "evidence_summary"]
            }
        },
        "overall_assessment": {
            "type": "string",
            "enum": ["reliable", "questionable", "unreliable", "mixed"]
        },
        "recommendation": {"type": "string"}
    },
    "required": ["claims", "overall_assessment", "recommendation"]
}

response = client.chat.completions.create(
    model="gpt-5-2025-08-07",  # Use latest models
    messages=[
        {"role": "system", "content": "You are a fact-checking expert. Analyze claims and provide structured evidence-based assessments."},
        {"role": "user", "content": "Fact-check this statement: 'COVID-19 vaccines are 95% effective and have been tested on millions of people worldwide.'"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "fact_check_result",
            "schema": fact_check_schema
        }
    }
)

fact_check = json.loads(response.choices[0].message.content)
for claim in fact_check['claims']:
    print(f"Claim: {claim['claim_text']}")
    print(f"Verdict: {claim['verdict']}")
    print(f"Confidence: {claim['confidence']:.2f}")
    print(f"Evidence: {claim['evidence_summary']}")
    print("---")

Schema Validation

CheckThat AI validates all structured outputs against your JSON schema:

# This schema requires specific fields and data types
strict_schema = {
    "type": "object",
    "properties": {
        "timestamp": {
            "type": "string",
            "format": "date-time"
        },
        "score": {
            "type": "number",
            "minimum": 0,
            "maximum": 100
        },
        "tags": {
            "type": "array",
            "items": {"type": "string"},
            "minItems": 1,
            "maxItems": 5
        },
        "status": {
            "type": "string",
            "enum": ["pending", "approved", "rejected"]
        }
    },
    "required": ["timestamp", "score", "status"],
    "additionalProperties": false  # Strict: no extra fields allowed
}

try:
    response = client.chat.completions.create(
        model="gpt-5-2025-08-07",  # Use latest models
        messages=[
            {"role": "user", "content": "Evaluate this claim and provide a structured assessment with timestamp, score, and status."}
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "strict_evaluation",
                "schema": strict_schema
            }
        }
    )
    
    # Response is guaranteed to match the schema
    result = json.loads(response.choices[0].message.content)
    print("✅ Valid structured response received")
    
except json.JSONDecodeError:
    print("❌ Invalid JSON received")
except Exception as e:
    print(f"❌ Error: {e}")

Best Practices

Schema Design Guidelines

Keep schemas focused and specific:

Define clear, descriptive property names
Use enums for categorical data to ensure consistency
Set appropriate constraints (min/max values, string lengths)
Include descriptions for complex fields

Example of a well-designed schema:

{
  "type": "object",
  "properties": {
    "extracted_entities": {
      "type": "array",
      "description": "List of entities found in the text",
      "items": {
        "type": "object",
        "properties": {
          "entity": {"type": "string", "description": "The entity text"},
          "type": {
            "type": "string",
            "enum": ["person", "organization", "location", "date"],
            "description": "Category of the entity"
          },
          "confidence": {
            "type": "number",
            "minimum": 0,
            "maximum": 1,
            "description": "Confidence in entity extraction"
          }
        },
        "required": ["entity", "type", "confidence"]
      }
    }
  },
  "required": ["extracted_entities"]
}

Performance Optimization

Optimize for speed and accuracy:

Use simpler schemas for faster processing
Avoid deeply nested objects when possible
Set reasonable array size limits
Consider model capabilities when designing schemas

Token efficiency:

Shorter property names reduce token usage
Use enums instead of free-form text where possible
Balance between structure detail and token cost

Error Recovery

Handle validation failures gracefully:

def robust_structured_call(client, messages, schema, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-5-2025-08-07",  # Use latest models
                messages=messages,
                response_format={
                    "type": "json_schema",
                    "json_schema": {"name": "structured_output", "schema": schema}
                }
            )
            
            # Validate the response
            data = json.loads(response.choices[0].message.content)
            return data
            
        except (json.JSONDecodeError, Exception) as e:
            if attempt == max_retries - 1:
                raise e
            print(f"Attempt {attempt + 1} failed, retrying...")
    
    return None

Common Use Cases

Data Extraction

Extract structured information from unstructured text, documents, or web content.

Content Classification

Categorize content with consistent taxonomies and confidence scores.

Fact-Check Reports

Generate standardized fact-checking reports with evidence and sources.

Survey Analysis

Process survey responses into structured data for analysis.

Schema Complexity: Very complex schemas with deep nesting or many constraints may impact response time and accuracy. Start simple and iterate based on your needs.

Testing Schemas: Test your JSON schemas with sample data before deploying to production. Use online JSON schema validators to ensure your schemas are well-formed.

Getting started

Capabilities

Other

Overview

Using Pydantic Models (Recommended)

Alternative: JSON Schema Format

Response Format

Advanced Examples

Data Extraction from Text

Fact-Checking with Structured Output

Schema Validation

Best Practices

Common Use Cases

Data Extraction

Content Classification

Fact-Check Reports

Survey Analysis

Getting started

Capabilities

Other

​Overview

​Using Pydantic Models (Recommended)

​Alternative: JSON Schema Format

​Response Format

​Advanced Examples

​Data Extraction from Text

​Fact-Checking with Structured Output

​Schema Validation

​Best Practices

​Common Use Cases

Data Extraction

Content Classification

Fact-Check Reports

Survey Analysis

Overview

Using Pydantic Models (Recommended)

Alternative: JSON Schema Format

Response Format

Advanced Examples

Data Extraction from Text

Fact-Checking with Structured Output

Schema Validation

Best Practices

Common Use Cases