Testing non-deterministic systems with confidence

Testing Non-Deterministic Systems with Confidence

A conventional unit test pretends certainty: one run, one verdict. AI systems do not behave that way.

PUnit is an experimentation and probabilistic testing framework for the regression testing of non-deterministic systems and services.

View on GitHub


The Problem

Deterministic testing assumes that identical input always produces identical output.

Modern AI systems increasingly break this assumption.

LLM-based applications, recommendation engines, retrieval pipelines, and probabilistic models often behave differently between executions — even when the input remains unchanged. LLM-based systems require statistical testing approaches instead of relying solely on deterministic assertions.

Traditional assertions like:

assertEquals(expected, actual);

are often no longer sufficient.

The real question becomes:

Is the system behaving acceptably within statistically defined boundaries?

PUnit helps teams answer exactly this question.


What is PUnit?

PUnit is an experimentation and probabilistic testing framework for the regression testing of non-deterministic systems and services.

It extends JUnit 5 with statistical testing capabilities for AI systems, LLM applications, and stochastic software behaviour.

PUnit integrates directly into existing Java testing and CI/CD workflows.

PUnit is built around three core ideas:

Measure

Assert statistical behaviour instead of relying solely on deterministic assertions.

Regress

Detect statistically significant degradation before it reaches production.

Trust

Generate measurable and reproducible evidence for AI-enabled systems.

Example

In practice, probabilistic tests can look like this:

class GreetingServiceTest {

    @ProbabilisticTest
    void serviceGreetsConsistently() {

        PUnit.testing(
            Sampling.of(
                nf -> new GreetingService(),
                100,
                List.of("Alice", "Bob", "Charlie")
            )
        )
        .criterion(
            PassRate.meeting(0.95, ThresholdOrigin.SLA)
        )
        .assertPasses();
    }
}

PUnit transforms repeated executions into statistical evidence instead of relying on single deterministic outcomes.


Learn More


About the Project

PUnit is an open-source project exploring probabilistic approaches to software testing for non-deterministic systems.

The project is open source and actively evolving.