SynthABA Technical Integration Guide

Version 1.0.0 | For ML Engineers and Data Engineers

Quick Start
API Reference
Authentication
Document Types Catalog
Data Format & Schema
Schema Validation
Train/Val/Test Splits
Quality Metrics
Batch Output Structure
Integration Examples
Rate Limits & Timeouts
Versioning
Support

Quick Start

Generate your first batch of synthetic ABA clinical records in under 30 seconds:

curl -X POST https://synthaba-production.up.railway.app/generate \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-api-key" \
  -d '{"case_type": "soap_note", "count": 5, "language": "en"}'

The response includes the generated records inline along with quality metrics, split information, and batch provenance. Every record is validated against its Pydantic schema before it leaves the pipeline.

To verify the service is running:

curl https://synthaba-production.up.railway.app/health
# {"status": "ok", "generator_ready": true}

API Reference

POST /generate

Generate a batch of synthetic clinical records.

Request Body

| Field | Type | Required | Default | Description | |-------------|--------|----------|------------|--------------------------------------------------| | case_type | string | Yes | -- | One of the 25 supported document types (see catalog below) | | count | int | Yes | -- | Number of records to generate (1 -- 2000) | | language | string | No | "en" | Language code: en (English) or es (Spanish) | | client_id | string | No | "api" | Identifier for your application or pipeline | | order_id | string | No | null | Your internal order or job reference | | target | string | No | "cloud" | Deployment target hint | | priority | string | No | "normal" | Processing priority: normal, high, or urgent |

Response Body

| Field | Type | Description | |---------------------|---------|------------------------------------------------------------| | batch_id | string | Unique identifier for this batch (UUID) | | path | string | Server-side path to the batch output directory | | records_generated | int | Number of records successfully generated | | errors | int | Number of records that failed generation | | success_rate | float | Ratio of successful records (0.0 -- 1.0) | | quality_score | float | Overall quality score from the 8-gate pipeline (0.0 -- 1.0) | | quality_passed | bool | Whether the batch passed all quality gates | | splits | object | Record counts per split: {"train": N, "validation": N, "test": N} | | ready_for_vlayer | bool | Provenance metadata mirroring quality_passed; reserved for a future VLayer (source-code compliance scanner) integration -- roadmap | | elapsed_seconds | float | Wall-clock time for generation | | cases | array | Full array of generated records (train + validation + test combined) |

Error Codes

| Code | Meaning | |------|---------------------------------------------------------------| | 400 | Invalid input -- bad case_type, count out of range, etc. | | 401 | Invalid or missing API key | | 500 | Internal generation error (include batch_id in support tickets) | | 503 | Generator not ready -- the model is still initializing |

GET /health

Check whether the API and generator are online.

Response

{
  "status": "ok",
  "generator_ready": true
}

If generator_ready is false, the server is still loading. Retry after a few seconds.

Web API Routes (Vercel)

The SynthABA commerce layer runs on Vercel and provides these endpoints:

| Route | Method | Description | |---------------------------|--------|------------------------------------------------------------| | /api/create-checkout | POST | Creates a Stripe checkout session for purchasing a dataset | | /api/webhook | POST | Stripe webhook handler -- triggers generation, upload, and email delivery | | /api/download/[id] | GET | Returns a signed download URL (24-hour expiry) for a completed batch | | /api/request-sample | POST | Generates a free 5-record sample (rate limited per email) |

These routes are for the web storefront. If you are integrating SynthABA data into an ML pipeline, use the Railway API directly.

Authentication

Railway API (Generation)

Pass your API key in the X-API-Key header on every request:

curl -H "X-API-Key: your-api-key" \
  https://synthaba-production.up.railway.app/generate ...

If no GENERATOR_API_KEY is configured on the server, the endpoint runs in open dev mode and accepts any request. In production, an invalid or missing key returns HTTP 401.

Web API (Commerce)

The Vercel web routes use Stripe for payment authentication. No separate API key is needed -- Stripe session tokens handle authorization for checkout and download flows.

Document Types Catalog

SynthABA generates 25 clinical document types across four disciplines.

ABA -- Applied Behavior Analysis (10 types)

| Type Key | Description | |-----------------------|---------------------------------------------------------------| | soap_note | Subjective/Objective/Assessment/Plan session documentation | | abc_data | Antecedent-Behavior-Consequence data collection records | | treatment_goals | Individualized treatment goals with measurable objectives | | insurance_auth | Insurance authorization requests with medical necessity | | crisis_plan | Behavioral crisis intervention and de-escalation plans | | supervision_note | BCBA supervision session documentation | | discharge_summary | Treatment discharge summaries with outcome data | | progress_report | Periodic progress reports for payors and families | | session_data | Quantitative session data (trial counts, duration, frequency) | | assessment_section | Functional behavior assessment sections |

Psychotherapy / Mental Health (8 types)

| Type Key | Description | |-------------------------------|-----------------------------------------------------| | psychotherapy_note | Individual psychotherapy session notes | | psychiatric_eval | Psychiatric evaluation and diagnostic assessment | | mental_status_exam | Mental status examination documentation | | safety_plan | Safety planning for at-risk individuals | | group_therapy_note | Group therapy session documentation | | family_therapy_note | Family therapy session documentation | | psychological_testing | Psychological testing reports and interpretations | | substance_abuse_assessment | Substance use disorder assessment documentation |

Speech-Language Pathology (4 types)

| Type Key | Description | |-----------------------|-----------------------------------------------------| | slp_evaluation | Speech-language pathology initial evaluation | | slp_session_note | SLP treatment session documentation | | slp_progress_report | SLP periodic progress reports | | slp_treatment_plan | SLP treatment plan with goals and objectives |

Occupational Therapy (3 types)

| Type Key | Description | |---------------------|-------------------------------------------------------| | ot_evaluation | Occupational therapy initial evaluation | | ot_session_note | OT treatment session documentation | | ot_treatment_plan | OT treatment plan with goals and objectives |

Data Format & Schema

Format

All records are JSON. Each batch produces three split files (train.json, validation.json, test.json), each containing a JSON array of record objects.

Base Envelope

Every record, regardless of document type, contains these base fields:

| Field | Type | Description | |---------------------|----------|------------------------------------------------------| | case_id | string | Unique UUID for this record | | case_type | string | Document type key (e.g., soap_note) | | language | string | en or es | | difficulty | string | basic, intermediate, advanced, or expert | | patient_context | object | De-identified patient demographics (see below) | | generated_by | string | Model identifier used for generation | | generator_version | string | Pipeline version (e.g., 1.0.0) | | synthetic | bool | Always true -- marks the record as synthetic |

Patient Context Object

The patient_context field is present on every record and follows a fixed schema:

| Field | Type | Values / Range | |----------------------------|---------------|-----------------------------------------------------| | age_band | string (enum) | 2-3, 4-5, 6-11, 12-17, 18+ | | sex | string (enum) | male, female | | diagnosis_codes | array[string] | ICD-10 codes: F84.0, F84.1, F84.5, F90.2, F90.0, F70, F71, F80.1, F88 | | severity | string (enum) | mild, moderate, severe | | comorbidities | array[string] | Free text (e.g., "speech delay", "sensory processing") | | insurer | string (enum) | uhc, bcbs, aetna, cigna, tricare, medicaid_fl | | authorized_hours_weekly | float | Weekly authorized treatment hours | | months_in_treatment | int | Duration of treatment in months | | setting | string (enum) | clinic, home, school, telehealth |

Type-Specific Fields

Each document type adds fields specific to its clinical purpose. For example, a soap_note includes subjective, objective, assessment, plan, session_type, cpt_code, provider_role, and behaviors_observed. An abc_data record includes antecedent, behavior, consequence, function, and frequency fields.

Refer to the JSON Schema files in documentation/schemas/ within each batch for the complete field specification per type.

Schema Validation

Every batch includes exported JSON Schema files. You can validate records independently using either jsonschema or Pydantic directly.

Using jsonschema (Python)

import json
import jsonschema

# Load the schema for your document type
with open("batch_xxx/documentation/schemas/soap_note_schema.json") as f:
    schema = json.load(f)

# Load records
with open("batch_xxx/data/train.json") as f:
    records = json.load(f)

# Validate each record
for i, record in enumerate(records):
    try:
        jsonschema.validate(record, schema)
    except jsonschema.ValidationError as e:
        print(f"Record {i} failed validation: {e.message}")

Using Pydantic (Python)

If you have access to the SynthABA template classes:

from templates.soap_note import SyntheticSOAPNote

# Validate a single record
record = {"case_id": "...", "case_type": "soap_note", ...}
validated = SyntheticSOAPNote(**record)  # raises ValidationError on invalid data

Schema Export

The pipeline uses Pydantic v2 with model_json_schema() to export JSON Schema files. All fields have type constraints, max_length limits, and ge/le bounds where applicable. Enums are exported as {"enum": [...]} in the JSON Schema.

Train/Val/Test Splits

Every batch is automatically split into three partitions:

| Split | Percentage | File | |--------------|------------|----------------------------| | Train | 70% | data/train.json | | Validation | 15% | data/validation.json | | Test | 15% | data/test.json |

Split Properties

Deterministic: Uses seed=42 for reproducible splits. The same input records will always produce the same partition.
Stratified by severity: Records are shuffled, maintaining overall demographic distribution across splits (mild, moderate, severe proportions are preserved).
No leakage: Index-based partitioning guarantees zero overlap between splits.

Loading Splits

import json

def load_batch(batch_dir: str) -> dict:
    """Load all three splits from a batch directory."""
    splits = {}
    for split_name in ["train", "validation", "test"]:
        path = f"{batch_dir}/data/{split_name}.json"
        with open(path) as f:
            splits[split_name] = json.load(f)
    return splits

batch = load_batch("batch_xxx")
print(f"Train: {len(batch['train'])} records")
print(f"Validation: {len(batch['validation'])} records")
print(f"Test: {len(batch['test'])} records")

Quality Metrics

Every batch passes through an 8-gate quality pipeline before delivery; per-gate results are sealed in each document's provenance passport. The quality report is saved to quality/quality_report.json and quality/quality_report_full.json. Alongside the gates, the quality ledger records two informational entries (train/val/test split and version control) that document batch accounting metadata -- they are not pass/fail gates.

The 8 Quality Gates

| Gate | Name | What It Checks | Pass Threshold | |------|-------------------------|---------------------------------------------------------------|------------------| | 1 | Schema Validation | Every record validates against its Pydantic model | < 2% reject rate | | 2 | Completeness | At least one substantial text field is populated | >= 95% complete | | 3 | Clinical Consistency | Age-hours, severity-goals, CPT-provider role rules hold | < 10% reject rate| | 4 | Deduplication | No near-duplicates (TF-IDF cosine similarity > 0.95) | 0 duplicates | | 5 | PHI Leak Detection | Zero matches against 10+ PHI patterns (SSN, phone, email, etc.)| 0 findings | | 6 | Demographic Balance | Sex distribution within 15% of target (72% male / 28% female) | < 15% deviation | | 7 | Vocabulary Consistency | Behavioral functions use closed vocabulary only | 0 invalid terms | | 8 | Edge Case Coverage | Minimum representation of severe, comorbid, telehealth cases | >= 5% severe, >= 10% comorbid, >= 5% telehealth |

Informational Ledger Entries (not gates)

| Entry | Name | What It Records | |-------|----------------------|----------------------------------------------------| | A | Train/Val/Test Split | Split proportions applied by the splitter module | | B | Version Control | Generator and template versions on every record |

Quality Report Fields

The quality/quality_report.json file contains:

{
  "all_gates_passed": true,
  "overall_quality_score": 0.98,
  "total_records": 100,
  "records_passed": 98,
  "records_failed": 2,
  "metrics": {
    "total_records": 100,
    "valid_schema_rate": 1.0,
    "duplicate_rate": 0.0,
    "contradiction_rate": 0.02,
    "missing_critical_fields_rate": 0.0,
    "vocabulary_compliance_rate": 1.0,
    "internal_consistency_score": 0.97,
    "split_leakage_check": true,
    "edge_case_coverage": { ... },
    "clinical_plausibility_proxy": 0.98
  },
  "demographic_audit": { ... },
  "gate_results": { ... },
  "detailed_failures": [ ... ]
}

Key Metrics Explained

| Metric | Description | |--------------------------------|------------------------------------------------------------------| | valid_schema_rate | Fraction of records passing Pydantic validation (target: 1.0) | | duplicate_rate | Fraction of record pairs with cosine similarity > 0.95 (target: 0.0) | | contradiction_rate | Fraction of records failing clinical consistency rules | | clinical_plausibility_proxy | Aggregate clinical consistency score (1.0 - contradiction_rate) | | internal_consistency_score | Composite of age-hours, severity-hours, and CPT-provider checks | | split_leakage_check | Boolean confirming zero overlap between train/val/test |

Demographic Audit

The demographic_audit section (also saved separately as quality/demographic_audit.json) breaks down record distribution across:

Age group: 2-3, 4-5, 6-11, 12-17, 18+
Sex: male, female
Diagnosis profile: ICD-10 code distribution
Severity band: mild, moderate, severe
Setting: clinic, home, school, telehealth
Payer context: uhc, bcbs, aetna, cigna, tricare, medicaid_fl
Intervention mix: FCT, DRA, DTT, NET, PRT, token economy, etc.
Behavior categories: escape, attention, tangible, sensory
Language: en, es

Each category shows both raw count and percentage of total records.

Batch Output Structure

Delivered Package (what the buyer receives)

Every delivered dataset is a self-contained package, offline-verifiable with stock Python 3:

delivery/
├── documents/                    # One JSON file per generated document
│   └── {document_id}.json
├── passports/                    # One sealed provenance passport per document
│   └── {document_id}.json        # Gate results, PHI verdict, model versions, SHA-256
├── MANIFEST.passport.json        # Batch manifest binding all passports (self-hashed)
├── verify_passport.py            # Standalone verifier -- python3 verify_passport.py <dataset_dir>
├── ATTESTATION.json              # Values for the attestation's Dataset Details table
└── legal/
    ├── SynthABA_DUA_Template.pdf
    └── SynthABA_Synthetic_Attestation_Template.pdf

Server-Side Batch Directory (internal, not shipped)

Internally, each generation run also writes a working directory on the server. These artifacts back the audit trail and are not part of the delivered package:

batch_{id}/
├── data/                         # train/validation/test splits (70/15/15)
├── documentation/                # datasheet.yaml, healthsheet.yaml, data_card.yaml,
│                                 # nutrition_label.yaml, schemas/
├── quality/                      # quality_report.json, quality_report_full.json,
│                                 # demographic_audit.json
├── provenance/                   # generation_manifest.yaml, prompt_config.yaml,
│                                 # ontology_version.yaml, pipeline_version.txt,
│                                 # policy_applied.yaml, vlayer_passport.json (internal
│                                 # provenance record -- not shipped to buyers)
├── compliance/                   # audit_log.jsonl, clinical_review/, test_evidence/
├── raw_batch.json                # All records before splitting
└── manifest.json                 # Batch metadata and file inventory

What Goes Where

documents/ + passports/ (delivered): The records you load into your ML pipeline, each with a sealed provenance passport. Run verify_passport.py to verify every document, passport, and the manifest independently.
quality/ (server-side): Machine-readable quality metrics backing the per-passport gate results.
provenance/ + compliance/ (server-side): Full lineage tracking and audit trail retained for regulatory compliance.

Integration Examples

Python -- Load and Iterate

import json
from pathlib import Path

def load_batch(batch_dir: str) -> dict:
    """Load a SynthABA batch into a dict of splits."""
    batch_path = Path(batch_dir)
    splits = {}
    for split in ["train", "validation", "test"]:
        with open(batch_path / "data" / f"{split}.json") as f:
            splits[split] = json.load(f)
    return splits

# Load the batch
batch = load_batch("batch_abc123")

# Iterate over training records
for record in batch["train"]:
    case_type = record["case_type"]
    case_id = record["case_id"]
    severity = record["patient_context"]["severity"]
    print(f"[{case_type}] {case_id} -- severity: {severity}")

Python -- Filter by Demographics

# Get all severe cases from the training set
severe_train = [
    r for r in batch["train"]
    if r["patient_context"]["severity"] == "severe"
]
print(f"Severe training cases: {len(severe_train)}")

# Get all telehealth cases
telehealth = [
    r for r in batch["train"]
    if r["patient_context"]["setting"] == "telehealth"
]

# Get Spanish-language records
spanish = [r for r in batch["train"] if r["language"] == "es"]

Python -- Quality Gate Check

import json

with open("batch_abc123/quality/quality_report.json") as f:
    qr = json.load(f)

# Reject batches that fail quality gates
if not qr["all_gates_passed"]:
    print(f"Batch failed quality gates. Score: {qr['overall_quality_score']}")
    for gate_name, result in qr["gate_results"].items():
        if not result.get("passed"):
            print(f"  FAILED: {gate_name} -- {result}")
    raise ValueError("Batch did not pass quality gates")

# Check specific thresholds
metrics = qr["metrics"]
assert metrics["valid_schema_rate"] >= 0.99, "Schema validity too low"
assert metrics["duplicate_rate"] < 0.01, "Too many duplicates"
assert metrics["clinical_plausibility_proxy"] >= 0.90, "Clinical plausibility too low"

TypeScript -- Generate via API

interface GenerateRequest {
  case_type: string;
  count: number;
  language?: "en" | "es";
  client_id?: string;
  order_id?: string;
}

interface GenerateResponse {
  batch_id: string;
  records_generated: number;
  errors: number;
  success_rate: number;
  quality_score: number;
  quality_passed: boolean;
  splits: { train: number; validation: number; test: number };
  ready_for_vlayer: boolean;
  elapsed_seconds: number;
  cases: Record<string, unknown>[];
}

async function generateBatch(
  apiKey: string,
  request: GenerateRequest
): Promise<GenerateResponse> {
  const response = await fetch(
    "https://synthaba-production.up.railway.app/generate",
    {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "X-API-Key": apiKey,
      },
      body: JSON.stringify(request),
    }
  );

  if (!response.ok) {
    const error = await response.json();
    throw new Error(`Generation failed (${response.status}): ${error.detail}`);
  }

  return response.json();
}

// Usage
const result = await generateBatch("your-api-key", {
  case_type: "soap_note",
  count: 100,
  language: "en",
});

console.log(`Generated ${result.records_generated} records`);
console.log(`Quality score: ${result.quality_score}`);
console.log(`Splits: ${JSON.stringify(result.splits)}`);

HuggingFace Datasets Integration

from datasets import Dataset

# Load SynthABA batch into HuggingFace Dataset
import json

with open("batch_abc123/data/train.json") as f:
    train_records = json.load(f)

# Flatten patient_context into top-level columns for easier filtering
flat_records = []
for r in train_records:
    flat = {**r}
    ctx = flat.pop("patient_context", {})
    for k, v in ctx.items():
        flat[f"patient_{k}"] = str(v) if isinstance(v, list) else v
    flat_records.append(flat)

ds = Dataset.from_list(flat_records)
print(ds)
# Dataset({
#     features: ['case_id', 'case_type', 'language', 'difficulty', ...],
#     num_rows: 70
# })

# Filter and map
severe_ds = ds.filter(lambda x: x["patient_severity"] == "severe")

Rate Limits & Timeouts

| Parameter | Value | |-------------------|----------------------------------------------------| | Max batch size | 2,000 records per request | | Generation timeout| Up to 5 minutes for large batches | | Recommended batch | 100 -- 500 records for optimal performance | | Sample endpoint | 5 records max, rate limited per email address |

Recommendations

For datasets over 2,000 records, issue multiple sequential requests and merge the results client-side.
Set HTTP client timeouts to at least 300 seconds (5 minutes) for large batches.
For production pipelines, use batch sizes of 100--500 records. This balances throughput with per-request reliability.
Monitor elapsed_seconds in the response to calibrate your pipeline scheduling.

Versioning

Every record and every batch tracks version information for full reproducibility.

Version Fields in Records

| Field | Example | Description | |---------------------|-----------|-------------------------------------| | generator_version | 1.0.0 | SynthABA pipeline version | | template_version | 1.0.0 | Schema template version | | generated_by | claude-sonnet | Model used for content generation |

Version Files in Batches

provenance/pipeline_version.txt -- Plain text pipeline version
provenance/ontology_version.yaml -- Clinical vocabulary version
provenance/generation_manifest.yaml -- Full generation metadata

Compatibility

When the pipeline version changes, the JSON Schema may gain new fields. New fields are always additive (existing fields are never removed or renamed). Pin to a specific generator_version in your data loading code if you need strict schema stability:

expected_version = "1.0.0"
for record in records:
    assert record["generator_version"] == expected_version, (
        f"Unexpected version: {record['generator_version']}"
    )

Support

Technical support: support@synthaba.com
API issues: Always include the batch_id from the response in your support request.
Bug reports: Include the full response body (or at minimum batch_id, quality_score, and any error messages).
Schema questions: Reference the JSON Schema files in documentation/schemas/ within your batch -- they are the authoritative field specification.

Simula-inspired generation pipeline

Scope: Simula validation pipeline — runs on the internal corpus (generated_documents); integration with the delivery path is on the roadmap. Documents delivered to customers today are produced by the delivery pipeline summarized in the table below, and what actually ran on each document is sealed in its provenance passport.

Current delivery pipeline

| Stage | Component | Passport status per delivered document | |-------|-----------|----------------------------------------| | Generator | Claude Sonnet 4.6 | executed — exact model version sealed | | Quality gates | 8 automated gates | executed — per-gate results sealed | | PHI verification | Automated PHI content verifier | executed — verdict sealed | | Provenance | Sealed per-document passport, SHA-256 chain of custody | always written | | Dual-critic + semantic judge (Simula) | Internal corpus only | not_run — roadmap for delivery path |

As of April 2026, SynthABA's Simula pipeline implements the four principles of Google Research's Simula framework (TMLR, 2026), specialized for behavioral health documents. This section explains the architecture; the accompanying blog post at /blog/simula-clinical-synthetic-data covers the rationale and research lineage.

The four principles applied

| Simula principle | SynthABA implementation | |------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Global diversification | Weighted sampling from a 113-node clinical taxonomy (behaviors, interventions, assessments, comorbidities, settings, phases, barriers). Rare-node boost for long-tail coverage. | | Local diversification | Per-node meta-prompt variants stored in prompt_variants. Each generation samples a variant, so the same taxonomy path never produces the same prose twice. | | Complexity scoring | Absolute 1-10 score from a scorer LLM plus Elo calibration via nightly pairwise comparisons. Edge cases filterable with complexity_elo > 1700. | | Dual-critic anti-sycophancy | Two parallel independent reviewer calls with adversarial prompts, followed by a semantic judge call that reconciles which errors the reviewed document actually addresses. |

Pipeline architecture

                    Taxonomy sampler
                    (113 nodes, rarity-weighted)
                          |
                          v
                     DocumentSpec
                          |
                          v
                     Generator  (Claude Opus 4.7)
                          |
                          v
              +-----------+-----------+
              |                       |
              v                       v
        Critic A              Critic B
    (Claude Opus 4.7)     (Claude Opus 4.7)
    "quality review"      "adversarial review"
              |                       |
              +-----------+-----------+
                          |
                          v
                    Semantic judge
                    (Claude Haiku 4.5)
                          |
                          v
                 approve | reject (+ regenerate)
                          |
                          v
                    Provenance trail
                    -> Supabase Postgres

Running in parallel with the dual-critic path:

Generator output -> Complexity scorer (Claude Opus 4.7) -> absolute + Elo
                                                             |
                                                             v
                                                     Nightly pairwise calibration

Model versions

| Role | Model | |-------------------|-------------------------------| | Generator | claude-opus-4-7 | | Critic A (quality)| claude-opus-4-7 | | Critic B (adversarial) | claude-opus-4-7 | | Complexity scorer | claude-opus-4-7 | | Semantic judge | claude-haiku-4-5-20251001 |

Every approved document carries a provenance.model_versions block listing the exact IDs so downstream consumers can audit which model(s) generated and validated the data.

API surface

The Simula pipeline exposes a small set of endpoints:

| Method | Path | Purpose | |--------|-----------------------------------|------------------------------------------------------------------| | POST | /v1/simula/jobs | Create a generation job (domain, document_type, count, tier). | | GET | /v1/simula/jobs/{job_id} | Poll job status. | | GET | /v1/simula/jobs/{job_id}/documents | List documents in a job with provenance. | | GET | /v1/simula/jobs/{job_id}/coverage | Taxonomy coverage for the job. | | GET | /v1/simula/coverage/{domain} | Global count of active taxonomy nodes per category. | | POST | /v1/simula/calibrate-elo | Trigger a pairwise Elo calibration batch (cron entrypoint). |

Tiers

Standard — complexity 1-6. Covers the common clinical scenarios.
Premium Edge Case — complexity 7-10. Rare comorbidity combinations and high-risk cases. Documents targeting complexity below 7 are auto-bumped to 8 when this tier is selected.

For a deeper treatment of each principle and why it matters for clinical AI training data, read the Simula blog post.

Technical Integration Guide