SynthABA Product Data Sheet
Production-grade synthetic clinical data for behavioral health AI
What is SynthABA?
SynthABA generates production-grade synthetic clinical documentation for behavioral health AI training. 25 document types across 4 disciplines (ABA, Psychotherapy, Speech-Language Pathology, Occupational Therapy). Every record is schema-validated, clinician-reviewed, and contains zero real patient data.
The Problem
- AI companies need clinical documentation data to train NLP, classification, and generation models
- Real patient data is protected by HIPAA -- acquiring, de-identifying, and managing it costs $500K-$2M+ and takes 6-18 months
- Behavioral health (ABA, SLP, OT) is severely underrepresented in existing datasets
- Poor training data leads to AI tools that make clinical errors
The Solution
SynthABA provides ready-to-train synthetic clinical datasets that are:
- Clinically accurate -- validated by licensed BCBAs, SLPs, OTs, and psychologists
- HIPAA-safe -- generated from templates, not real patient data. Zero PHI guaranteed (163+ pattern scan)
- Schema-validated -- Pydantic v2 enforcement with type constraints on every field
- Audit-trailed -- full provenance, VLayer passport with SHA-256, chain of custody
25 Document Types, 4 Disciplines
| Discipline | Count | Document Types | |---|---|---| | ABA | 10 | SOAP Notes, ABC Data, Treatment Goals, Insurance Auth, Crisis Plans, Supervision Notes, Discharge Summaries, Progress Reports, Session Data, Assessment Sections | | Psychotherapy | 8 | Psychotherapy Notes, Psychiatric Evals, Mental Status Exams, Safety Plans, Group Therapy Notes, Family Therapy Notes, Psychological Testing, Substance Abuse Assessments | | Speech-Language Pathology | 4 | Evaluations, Session Notes, Progress Reports, Treatment Plans | | Occupational Therapy | 3 | Evaluations, Session Notes, Treatment Plans |
Quality Pipeline
10-gate quality pipeline ensuring every record meets clinical and technical standards:
- Schema validation (Pydantic v2 with field-level type constraints)
- Clinical consistency checks across related fields
- Deduplication with similarity thresholds
- 163+ PHI detection patterns (SSN, phone, email, address, MRN, dates, names)
- Demographic balance auditing
- Vocabulary compliance against clinical terminology standards
- Edge case coverage (severity extremes, comorbidities, atypical presentations)
- VLayer compliance passport with SHA-256 hash for tamper detection
- Clinician review with 5-dimension rubric
- Inter-rater reliability (Cohen's Kappa >= 0.61)
Delivery Format
| Component | Details | |---|---| | Format | JSON (schema-validated, Pydantic JSON Schema included) | | Splits | Train (70%) / Validation (15%) / Test (15%), stratified by severity | | Included artifacts | Quality report, demographic audit, provenance chain, VLayer passport | | Documentation | Datasheet, Healthsheet, Data Card, Nutrition Label |
Pricing
| Tier | Records | Document Types | Price | |---|---|---|---| | Starter | 100 | 1 document type | $5,000 | | Professional | 500 | 5 document types | $25,000 | | Enterprise | 2,000+ | All 25 types | $100,000 |
All tiers include full quality pipeline, VLayer passport, and documentation artifacts.
Use Cases
- Train clinical NLP models -- entity extraction, summarization, classification
- Benchmark clinical AI products -- standardized synthetic data for consistent evaluation
- Test EHR integrations -- clinical documentation automation and interoperability
- Academic research -- synthetic data methodology and behavioral health AI
Trust and Compliance
| Dimension | Detail | |---|---| | PHI | Zero. Template-based generation, no real patient data as source | | HIPAA | Safe Harbor compliant | | Audit trail | Full provenance with SHA-256 hashing | | Clinical validation | Licensed BCBAs, SLPs, OTs, psychologists | | Tamper detection | VLayer compliance passport per batch |
Contact
- Website: synthaba.vercel.app
- Email: contact@synthaba.com
- Free sample: Request a 5-record sample at synthaba.vercel.app