Powered by Claude Opus 4.7

Production-Grade Synthetic Behavioral Health Data

Clinician-validated, HIPAA-safe, zero PHI. 25 document types across four disciplines — ABA, psychotherapy, speech-language pathology, and occupational therapy. Schema-validated, audit-trailed, ready to train.

25

Document Types

163+

PHI Detection Rules

10-Gate

Quality Pipeline

SHA-256

Provenance Hash

BCBA-Designed

Pipeline architected by BCBA-candidate clinician with master's in behavior analysis. Dual-critic AI validated against BCBA-D clinical standards.

Zero PHI

163+ rule VLayer PHI scanner — no real patient data, ever

Full Audit Trail

VLayer compliance passport with SHA-256 provenance, quality gates, and chain-of-custody for every record

VLayer Scanned

Every record passes through automated PHI detection and synthetic validation via VLayer Learn more →

VLayer Compliance Pipeline

Every record passes through four verification stages before release.

01

Ingest

Raw document enters the generation pipeline

02

PHI Scan

163+ rules verify zero real patient data

03

Validation

Clinical plausibility and schema conformance

04

Release

SHA-256 hash sealed in compliance passport

25 Clinical Document Types

Four clinical disciplines. Every document type used in real practice — schema-validated with Pydantic and clinically plausible.

All 25 document types are processed through our VLayer compliance pipeline: PHI scanning (163+ rules), synthetic validation, classification, and provenance passport with SHA-256 audit trail.
ABA

Applied Behavior Analysis

10 types

SOAP Notes

Session documentation — subjective, objective, assessment, plan

ABC Data

Antecedent-Behavior-Consequence recording sheets

Treatment Goals

SMART goals with baselines, targets, and measurement criteria

Insurance Auth

Authorization requests with clinical rationale per insurer

Crisis Plans

Behavior intervention plans for emergency situations

Supervision Notes

BCBA supervision session documentation

Discharge Summaries

Treatment completion reports with outcome data

Progress Reports

Periodic progress updates with quantitative metrics

Session Data

Raw trial-by-trial and interval recording data

Assessment Sections

VB-MAPP, ABLLS-R, and standardized assessments

PSY

Psychotherapy & Mental Health

8 types

Psychotherapy Note

Individual therapy sessions — CBT, DBT, EMDR modalities

Psychiatric Eval

Initial psychiatric evaluation with HPI and MSE

Mental Status Exam

Full 12-domain mental status examination

Safety Plan

Suicide prevention — Stanley-Brown safety planning

Group Therapy Note

Group sessions — process, psychoeducational, skills training

Family Therapy Note

Family therapy with dynamics and interventions

Psychological Testing

Cognitive, emotional, and personality assessment reports

Substance Abuse Assessment

SUD evaluation with DSM-5 criteria and staging

SLP

Speech-Language Pathology

4 types

SLP Evaluation

Initial speech-language evaluation with standardized tests

SLP Session Note

Treatment session with cueing hierarchy and accuracy data

SLP Progress Report

Periodic progress with goal tracking and justification

SLP Treatment Plan

Goals, objectives, frequency, and approaches

OT

Occupational Therapy

3 types

OT Evaluation

Initial OT eval — motor, sensory, self-care, VMI

OT Session Note

Treatment session with performance data and assistance level

OT Treatment Plan

Functional goals, modalities, and home program

How SynthABA ensures clinical quality at scale

Our generation pipeline implements the four principles of Google Research's Simula framework, specialized for behavioral health. The result: coverage of the long tail, not just the easy cases.

Taxonomic coverage

Every dataset samples from a 113-node clinical taxonomy spanning behaviors, interventions, assessments, comorbidities, settings, phases, and barriers. Coverage ratio reported per dataset.

Dual-clinician review

Every document passes independent review by two AI clinicians — one checking accuracy, one actively looking for errors. A third reviewer reconciles. Breaks the sycophancy bias that plagues single-LLM validation.

Calibrated complexity scoring

Each document is scored 1-10 on clinical complexity and calibrated with Elo ratings through pairwise comparison. Your team filters for edge cases the simple way: complexity_elo > 1700

Full provenance trail

Every document ships with a complete audit trail: taxonomy path sampled, complexity scores, both critic verdicts, HIPAA scan results, model versions. Auditable by your legal and ML teams.

Dataset Pricing

One-time purchase. You own the data. Includes full quality reports, documentation, and schema exports.

Standard

$25,000

1,000 documents

Complexity 1-6. Coverage across common clinical scenarios.

  • Schema-validated JSON
  • Train/val/test splits
  • Dual-critic verification
  • Taxonomic coverage report
  • VLayer compliance scan included
Request Sample
Most Popular

Premium Edge Case

$85,000

500 documents

3.4× per-document rate — justified by the edge cases your model fails on

Complexity 7-10. Rare clinical combinations that break models in production.

  • Everything in Standard
  • Elo-calibrated complexity (> 1700)
  • Rare comorbidity combinations
  • Priority generation
  • SHA-256 provenance passport
Request Sample

Enterprise

Custom

Custom volume

Multi-discipline datasets, custom document types, SLA, dedicated support.

  • Everything in Premium Edge Case
  • Multi-discipline datasets (ABA, SLP, OT, psych)
  • Custom schema modifications
  • Dedicated BCBA review
  • SLA + dedicated support
Request Sample

See the data before you buy

Enter your work email and we'll send you a free sample dataset — 5 records across document types so you can evaluate schema quality, clinical plausibility, and format compatibility.