Skip to main content

Securing Large Language Models: From threat to trust

A practical framework for securing AI

Authors:

  • Maurice Schubert | Partner, Cyber
  • Yasser Aboukir | Director, Cyber 

Large Language Models (LLMs) are transforming enterprise operations, but their adoption introduces new and distinct attack surfaces. From prompt injection to manipulation of autonomous agents, adversaries are exploiting these weaknesses with alarming success, creating a tangible link between vulnerability to trust.

In this article, we highlight three critical areas for business and security leaders:

  • Understanding AI-specific threats: How LLM vulnerabilities differ from traditional software security risks.
  • New testing standards: Applying the 2025 OWASP AI Testing Guide across application, model, data, and infrastructure layers.
  • Strategic defense: Implementing a "Shift Left" approach that combines foundational hygiene with AI-specific controls.

Drawing on proven threat modelling methodologies and real-world AI red-teaming, we show why strong security fundamentals remain the bedrock of both enterprise security and AI trust.

Introduction  

Large Language Models (LLMs) have reached production, and their security implications demand immediate operational focus. Organizations are rapidly integrating these systems into customer service, code generation, and decision support. Yet LLMs introduce attack vectors that are fundamentally different from those of traditional applications. The OWASP Top 10 for LLM Applications 2025 marks a clear turning point, and the newly published OWASP AI Testing Guide operationalizes this shift: AI trustworthiness, not security alone, is now the objective.

A useful analogy is the fictional "Order 66" scenario: trusted systems turned against their users by a single command. LLMs dramatically lower the barrier to such abuse, no cryptographic keys are required, only the right words. Field experience, however, consistently shows that these novel threats tend to layer on top of traditional weaknesses, making a "back to basics" security posture just as critical as advanced AI-specific defenses.

The new reality of LLM threats

Unlike conventional software vulnerabilities rooted in memory safety or input validation, LLM weaknesses arise from how models process and interpret language. OWASP identifies prompt injection as the most critical risk, driven by the absence of a strict separation between instructions and data.

  • Direct prompt injection occurs when attackers craft inputs that bypass safeguards or override system.
  • Indirect prompt injections embed malicious instructions in external content, such as websites, documents, or emails, that the model later processes.

Recent studies report 41%-56%1 vulnerability rates to injection attacks across state-of-the-art models, underscoring how systemic this issue remains.

The threat landscape expands further with the rise of Agentic AI. The 2025 OWASP list introduces risks such as excessive agency, where autonomous LLMs are authorized to invoke tools, call APIs, or interact with other systems. Research shows that 82,4%2 of AI agents execute malicious commands when requested by another agent, even when mediated by protocols such as MCP or agent-to-agent (A2A) controls.

When combined with sensitive information disclosure, including leaked prompts, proprietary data, or configuration secrets, the risk profile becomes both novel and severe. A misconfigured LLM API can result in large-scale data exposure, mirroring classic configuration failures but at far greater speed and with significantly reduced detectability.

 

Expanding attack surfaces: Plugins and supply chains

Two additional threat vectors deserve immediate attention.

  • Insecure plugin and tool design: Trusted connectors can be abused to escalate privileges, invoke unauthorized actions, or exfiltrate sensitive data.
  • Model supply chains risk: Deploying third-party models without verification introduces the possibility of poisoned training data, backdoored weights, or compromised repositories. These attacks execute at inference time and routinely evade traditional security testing.

We have observed incidents involving malicious weights injected into public model repositories, compromised development pipelines and adversarial examples embedded directly into training datasets. These threats are particularly insidious because they persist silently and bypass application-layer controls. Organizations must therefore treat LLM procurement and deployment with the same rigor applied to software bill of materials (SBOM), and third-party code review.

A four-layer framework for trust

OWASP's four-layer testing framework provides a practical structure for addressing these risks:

  • AI application layer: Test user interfaces and orchestration logic. This includes business logic abuse, prompt injection paths, and authorization boundaries. Can an attacker manipulate a chatbot into triggering unauthorized actions?
  • AI model layer: Assess the robustness of the LLM itself against jailbreaks, context poisoning, and model extraction. Do system prompts and safety layers reliably withstand adversarial inputs?
  • AI data layer: Evaluate the integrity and security of training data, embeddings, and vector databases. Testing focuses on data lineage, access control, and anomaly detection across ingestion pipelines.
  • AI infrastructure layer: Validate cloud configuration, IAM policies, and runtime isolation. A permissive role that poses limited risk in a traditional application can become catastrophic when assigned to an autonomous agent.
Figure 1: From hardware to AI services: A full value chain

Figure 1: From hardware to AI services: A full value chain

Critically, these layers do not operate in isolation. A vulnerability in one layer can amplify risks across others. For example, excessive agency at the application layer, combined with insufficient model safety guardrails and over permissive infrastructure access, can create a cascading failure with systemic impact.

Effective testing must therefore validate interactions between layers, how data, authority, and control flow from infrastructure through model inference to application logic, in order to uncover cross-boundary risks that would remain invisible in siloed assessments.

While the OWASP Testing Guide provides concrete test cases for each layer, practitioners must adapt them to their specific organizational context. A financial institution's risk profile differs fundamentally from that of a healthcare provider, just as the threat model for an autonomous agent differs from that of a customer-facing chatbot. The framework should be tailored to the deployment model, data sensitivity, regulatory obligations, and the threat actors most likely to target the organization’s assets.

From defense strategies to action

Securing LLM applications requires embedding security across the entire lifecycle, shifting from a reactive, "bolt-on" model to a secure-by-design philosophy.

Effective defense depends on layered countermeasures. Technical controls begin with secure system prompt design, explicitly instructing models to reject override and role-confusion attempts. This should be reinforced with input and output filtering to detect injection patterns before they reach production. Architectural controls focus on sandboxing and least privilege. Field experience consistently shows that AI security fails when fundamentals are ignored: rigorous Identity and Access Management (IAM) and cloud configuration remain are non-negotiable prerequisites.

To build trust, organizations should take the following actions:

  • Map attack vectors using established threat-modeling frameworks such as STRIDE or MITRE ATLAS.
  • Adopt "Shift Left" testing and conduct regular AI red-teaming (black-box and grey-box) early in the development lifecycle.
  • Maintain AI Bills Of Materials (AI-BOMs) to track model provenance and dependencies (e.g., OWASP CycloneDX).
  • Integrate continuous monitoring to detect anomalous behavior in production.

Beyond these foundational controls, implement model-specific safeguards. Enforce rate limiting to constrain prompt-injection attempts and resource-exhaustion attacks. Deploy output validation to scan responses for indicators of compromise, leaked credentials, system prompts, or unauthorized data references. Use semantic validation, leveraging secondary models or rule engines, to detect subtle jailbreaks and policy evasions.

At the infrastructure layer, enforce least-privilege access using cloud-native controls: tightly scoped IAM roles assigned to API keys, restrict container permissions, and audit all calls to LLM services.

For data protection, encrypt sensitive inputs before submission, apply anonymization or federated learning where feasible, and enforce strict access controls on vector databases. Finally, establish AI-specific incident response procedures, including playbooks for model manipulation, data-poisoning detection, and rapid rollback of compromised models.

Governance and operational controls are equally critical. Unauthorized or rogue LLM deployments can silently consume cloud resources, leak data, or violate regulatory requirements. A single user error, selecting the wrong model, exposing sensitive data, or triggering runaway costs, can lead to fines or costly remediation in the absence of guardrails. Organizations should therefore maintain a centralized AI asset inventory through AIOps platforms, enforce approval workflows for model deployment, and implement financial controls to prevent uncontrolled spending and risk accumulation.

Conclusion 

LLMs present both transformative opportunity and material risk. The associated attack techniques are not theoretical; adversaries are actively exploiting them today. Secure deployment, however, is achievable. Organizations that combine structured frameworks such as those from OWASP with rigorous, multi-layer testing and layered technical controls are already deploying LLM systems to production with managed risk.

The imperative is clear: LLM security must be treated with the same rigor as any other critical enterprise system. By validating defenses across all four layers before production deployment, security becomes an operational capability, not a source of friction, but an enabler of trust and scale.

"Field experience confirms that these novel threats typically layer on top of traditional weaknesses, making a 'back to basics' approach just as critical as advanced AI defenses."

"Organizations are integrating these powerful systems into customer service, code generation, and decision support. Yet LLMs introduce attack vectors fundamentally different from traditional applications."

Discover our Future of Advice Blog Homepage

1 Benjamin, V., et al. (2025). Systematically Analysing Prompt Injection Vulnerabilities in Diverse LLM Architectures. ICCWS, 20(1).  DOI: 10.34190/iccws.20.1.3292 Lupinacci, G., et al. (2025). "The Dark Side of LLMs: Agent-based Attacks for Complete Malware Deployment on Victim Machines." arXiv:2507.06850v5

2 Lupinacci, M., Pironti, F. A., Blefari, F., Romeo, F., Arena, L., & Furfaro, A. (2025). "The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover." arXiv preprint arXiv:2507.06850v5. University of Calabria & IMT School for Advanced Studies. July 2025 (updated September 2025).top1000funds, “How next-gen investors at GIC, Temasek harness AI potential”, 9 April 2025.

Did you find this useful?

Thanks for your feedback