Deloitte and NVIDIA now have a solution to help tackle one of enterprise AI’s toughest challenges: token drift. This breakthrough enables more accurate and reliable AI results, assisting organizations to enhance the impact of their digital transformation.
Until now, enterprises faced a limitation in large language model (LLM) computations. Even when configured for deterministic behavior, LLMs could produce different outputs on the same prompt. In Deloitte’s testing and agentic development, which spanned financial analysis platforms and data engineering workflows, the same SQL-generation query typically returned the correct column reference but it drifted to the wrong column occasionally.
In use cases like SQL code generation or named-entity recognition (NER), where a single token represents the only correct answer, this “token drift” meant unpredictable and inaccurate outcomes. For financial workflows, data pipelines, or code generation, this is an unacceptable level of risk.
Deloitte and NVIDIA traced the root cause of token drift to the compounding effects of floating-point arithmetic in GPU kernels—you might think of this as the butterfly effect of token drift. Even small arithmetic differences at the CUDA (Compute Unified Device Architecture) level can cascade into divergent token predictions. To address this, Deloitte worked with NVIDIA to introduce specialized NIM flags, first provided to Deloitte in 2024 and released publicly in NVIDIA NIM™ 1.10. Building on that foundation, the organizations developed a deployment approach that defines execution order and kernel selection to eliminate drift.
The specialized NIM flags are available for testing with LLAMA 3.1 models. On NVIDIA SXM systems, the solution works across any number of GPUs. Trials on non-SXM configurations can use the specially introduced TP2 profile. Deloitte also recommends:
Set temperature=0 for accuracy-critical agents
For workflows where creativity is undesired, such as NER or SQL generation, this setting is mandatory. It provides high quality and reproducible results.
Apply the same principle to LLM-as-a-judge scenarios
Evaluation agents must consistently provide a stable development environment. Without fixed outputs, regression testing is unreliable, and product development slows.
Deloitte and NVIDIA are working together to solve a major technical challenge and elevate expectations for enterprise AI. Eliminating token drift can establish a foundation of trust, reliability, and scale that transforms how businesses can deploy AI.
For Deloitte and its clients
The new NVIDIA-enabled approach enhances accuracy and reliability within Deloitte’s enterprise agentic portfolio, called Zora AI™.
For enterprises
This breakthrough mitigates a major barrier to operational AI, enabling mission-critical workflows with accuracy and repeatability. It also unlocks automation in CI/CD pipelines and continuous evaluation frameworks.
For technology leaders
Eliminating token drift makes enterprise AI more predictable, testable, and safe, offering a clear path to deployment in regulated industries.
For market innovation
With reliable inference and execution, enterprises can scale advanced training, fine-tuning, and evaluation methods, accelerating time-to-market for new applications.