The Ghost in the CI: why embedded tests flake

The ghost 150 million miles away

In 1997, the Mars Pathfinder started mysteriously rebooting on the surface of Mars. The culprit was not a broken component but a priority inversion in its scheduler.

Simplified priority inversion: the medium-priority task runs indefinitely, accidentally blocking the high-priority task from ever getting the resource it needs.

A meteorological task (low priority) held a shared lock. A communications task (medium priority) preempted it and ran so long that the critical bus manager (high priority) could never start. The watchdog timer, sensing the stall, rebooted the entire system. Without deterministic execution, these timing-dependent bugs are almost impossible to catch before they reach their destination.

Why firmware flakes

Flakiness is often a symptom of race conditions — bugs that only appear when specific events happen in a precise, microsecond-aligned order. The Therac-25 tragedy is the ultimate example: a race condition in the control software allowed a high-energy beam to fire without a safety shield, only if the operator was "too fast" with their keyboard commands.

The race condition: if the operator corrected a setting within 8 seconds, the software would sometimes fail to update the hardware state correctly, leading to an unshielded beam.

1. The "real-time" fallacy

Conventional testing relies on real-world time. If your CI runner doesn't happen to hit that exact microsecond window, the test passes. You get a green check, but the bug is still there. This is the real-time fallacy: thinking that testing in "real-time" is the same as testing all timing possibilities.

2. Non-deterministic jitter

Because host PCs have varying background loads, a test that passes at 10:00 AM might fail at 2:00 PM simply because of a background update. This jitter masks real bugs and turns CI into a game of chance. To catch a Therac-25 style bug, you need to control time, not just watch it.

The fix: separate simulated time from host time

To kill flakiness, we enforce a strict separation between simulated time and host time. The separation is a lesson from the 1991 Patriot Missile failure. A tiny rounding error in a 24-bit representation of time accumulated over 100 hours into a 0.34-second lag — enough to miss its target by half a kilometer.

The accumulation of error: in the Patriot system, a precision error of 0.0001% per step was negligible in a 1-minute test, but fatal after 100 hours of continuous operation.

LabWired solves this with a lockstep stepping mechanism. Instead of relying on a "real-time" clock that can drift or be interrupted by host OS noise, every instruction and hardware interrupt is synchronized to a fixed global simulation clock. There is zero drift because time only advances when the core steps.

For high-reliability targets, the engine also supports lockstep fault injection: a golden core and a shadow core run the same firmware step for step, and the full register state is compared after every instruction. Inject a bit flip or a skipped instruction into the shadow core, and the simulation halts at the exact instruction where its state diverges from the golden run — with a deterministic snapshot of the failure state.

The virtual replica: preventing the next ghost

In 1997, it took NASA three weeks of debugging on an exact terrestrial replica of the Pathfinder to find the priority inversion. In 2026, LabWired brings that "virtual replica" capability to every developer's local environment and CI pipeline.

Deterministic CI gates: with lockstep execution, wall-clock luck stops being a factor. If a timing bug exists — like the Therac-25 race condition — it is caught consistently, not intermittently.
Signal-level observability: when a simulation halts, you don't just get a log; you get a deterministic VCD trace. Analyze bus contention and signal shuffling as if you had a logic analyzer attached to the silicon.
Hardened regression testing: scale your validation across 50+ headless runners. Test for 100+ hour "long-run" drifts like the Patriot Missile case in minutes of simulated time.

Make your firmware CI deterministic →

Fighting flaky hardware tests in your pipeline? Write to contact@labwired.com or book 30 minutes.