Skip to main content

Deloitte and NVIDIA offer a solution to help address AI token drift

Addressing AI hallucinations through deterministic AI in the enterprise

Deloitte and NVIDIA now have a solution to help tackle one of enterprise AI’s toughest challenges: token drift. This breakthrough enables more accurate and reliable AI results, assisting organizations to enhance the impact of their digital transformation.


A roadblock to deterministic AI: Token drift explained

Until now, enterprises faced a limitation in large language model (LLM) computations. Even when configured for deterministic behavior, LLMs could produce different outputs on the same prompt. In Deloitte’s testing and agentic development, which spanned financial analysis platforms and data engineering workflows, the same SQL-generation query typically returned the correct column reference but it drifted to the wrong column occasionally.

In use cases like SQL code generation or named-entity recognition (NER), where a single token represents the only correct answer, this “token drift” meant unpredictable and inaccurate outcomes. For financial workflows, data pipelines, or code generation, this is an unacceptable level of risk.
 

Identifying the root cause: Tracing the source of token drift and AI hallucinations

Deloitte and NVIDIA traced the root cause of token drift to the compounding effects of floating-point arithmetic in GPU kernels—you might think of this as the butterfly effect of token drift. Even small arithmetic differences at the CUDA (Compute Unified Device Architecture) level can cascade into divergent token predictions. To address this, Deloitte worked with NVIDIA to introduce specialized NIM flags, first provided to Deloitte in 2024 and released publicly in NVIDIA NIM™ 1.10. Building on that foundation, the organizations developed a deployment approach that defines execution order and kernel selection to eliminate drift.
 

Putting the solution into practice: Actions for an AI workflow

The specialized NIM flags are available for testing with LLAMA 3.1 models. On NVIDIA SXM systems, the solution works across any number of GPUs. Trials on non-SXM configurations can use the specially introduced TP2 profile. Deloitte also recommends:

Set temperature=0 for accuracy-critical agents
For workflows where creativity is undesired, such as NER or SQL generation, this setting is mandatory. It provides high quality and reproducible results.

Apply the same principle to LLM-as-a-judge scenarios
Evaluation agents must consistently provide a stable development environment. Without fixed outputs, regression testing is unreliable, and product development slows.
 

Implications for enterprise AI: Enabling new standards for AI adoption

Deloitte and NVIDIA are working together to solve a major technical challenge and elevate expectations for enterprise AI. Eliminating token drift can establish a foundation of trust, reliability, and scale that transforms how businesses can deploy AI.

For Deloitte and its clients
The new NVIDIA-enabled approach enhances accuracy and reliability within Deloitte’s enterprise agentic portfolio, called Zora AI™.

For enterprises
This breakthrough mitigates a major barrier to operational AI, enabling mission-critical workflows with accuracy and repeatability. It also unlocks automation in CI/CD pipelines and continuous evaluation frameworks.

For technology leaders
Eliminating token drift makes enterprise AI more predictable, testable, and safe, offering a clear path to deployment in regulated industries.

For market innovation
With reliable inference and execution, enterprises can scale advanced training, fine-tuning, and evaluation methods, accelerating time-to-market for new applications.

Did you find this useful?

Thanks for your feedback