Closing the Testing Gap with Synthetic Data

Christina Yasini, product lead for Ecosystem Code, outlines a recorded session on “closing the testing gap with synthetic data,” defining synthetic data as computationally generated data that matches real-world statistical patterns, structural constraints, and domain logic without using real records. She contrasts rule-based, statistical, and model-driven approaches, emphasizing model-driven, domain-aligned synthetic data generated from the same ontology as the code to test systems against their original intent and reduce “false confidence” before deployment. The talk covers cross-industry use cases (ML training, finance/privacy, software testing, load testing, demos), limitations of common test data sources, and how synthetic data can support post-deployment simulation and continuous calibration. Jay then demos Ecosystem Code/Workbench capabilities for generating synthetic data (e.g., Postgres/Mongo), discusses quality metrics, and addresses creating deliberately “poor quality” error data. 00:31 Why Synthetic Data Matters 05:41 Defining Synthetic Data 08:53 Model Driven Generation 10:14 Industry Use Cases 16:01 Synthetic Data for LLMs 19:05 Test Data Source Gaps 22:09 False Confidence Failures 26:58 Domain Aligned Solution 28:55 Workflow Continuity Loop 33:34 Ecosystem Code Overview 36:26 Live Demo Walkthrough 42:14 Q&A on Quality Metrics