Time Appliances Project Call #155 March 25, 2026
Abstract: Modern AI inference pipelines are increasingly distributed across multiple nodes, yet they rely heavily on timestamp-based observability for debugging, tracing, and system understanding. This work explores a critical gap: observability can become causally incorrect even when the system itself remains fully functional. In this session, we will see results from controlled experiments on a multi-node AI inference pipeline, where small clock skews are intentionally introduced at the inference stage. The findings show that while throughput and model outputs remain stable, causality breaks in the observability layer, leading to negative spans and token ordering anomalies. The talk also highlights a key insight: these failures are transport-independent and rooted in time alignment across distributed systems. Additionally, unexpected self-recovery behavior is observed, suggesting that relative clock drift dynamically influences causal correctness. This session will provide a systems-level perspective on why time synchronization is a foundational requirement for trustworthy observability in distributed AI systems, and what this means for future infrastructure design. Speaker: Ankur Sharma is a technologist at Equinix, where he works on distributed infrastructure/networks, multi-cloud networking, AI systems and time synchronization systems at scale. He is the architect behind Equinix Precision Time™️, the industry’s first Time-as-a-Service offering, and has led multiple initiatives across AI infrastructure, observability, and hybrid networking platforms. Ankur is actively involved in the broader ecosystem through organizations such as OCP (Time Appliances Project and Unified Intelligent Infrastructure), WSTS (Workshop on Synchronization and Timing Systems), and the Agentic AI Foundation, where he contributes to discussions on timing, observability, and distributed AI systems
