Welcome to the third and final article in our three-part series, ‘Stronger Together’. In the previous two articles, we outlined a definition for Third Party Resilience and the foundational components of building and operating the framework. We highlighted the critical importance of building robust resilience standards for Third Parties and embedding these into the TPRM lifecycle from RfX through to exit. We also looked at the criteria that might prompt more intensive analysis and treatment of Third Parties as part of the framework. In this final article, we consider how the framework needs to be varied to cater for different types of Third Parties and how Third Party Resilience can be sustained over time, including through the use of automation and AI. We leave you with five, no-regret actions that all firms should consider as they embark on their Third Party Resilience programmes.
Many of our clients will find that the recommendations in this article work well up to a point but become more challenging to apply when it comes to certain categories of Third Parties.
The framework does need to be sensitive to three broad categories, which include:
However, firms also need to segment the population into categories below these levels, which may include Cloud Service Providers (CSPs), telecommunications, Financial Market Infrastructure (FMI), Financial Market Utilities (FMU), Software-as-a-Service (SaaS) providers etc.
The most common challenges we see with applying the framework are with hyperscalers, CSPs and FMIs. From our work with these organisations, this is not typically a question of reticence or amenability but rather stems from five key factors:
Let’s break these factors down a bit further with a few salient examples. One of the more frequent complaints we hear about Third Party Resilience is around poor responses to DDQs from larger organisations. Whilst it is possible that sometimes knowledge is a barrier in these instances, more often it is a question of how the DDQ has been formulated. For example, the FSI may ask about the resilience of a specific engagement but a CSP may not recognise this as a useful formulation of a question when services are all architected and designed based on common, resilient infrastructure so refracting the response to give an answer for one specific service with any degree of precision is not possible. In issuing the DDQ, FSIs need to consider which questions can be answered through open sources of information (e.g. public disclosures such as PFMI statements in the case of FMIs) and which questions they need to evaluate more forensically. In doing so, they need to consider how the DDQ evaluator can be supported with robust resilience training and access to technical Subject Matter Experts. We are now also seeing the use of both generative and agentic AI and LLMs to help evaluate returns and identify trends in submissions. This in turn is being used to optimise question sets so that these can be more forensic in the areas that really make a difference.
In terms of testing, there is likely to be some divergence in amenability to participate in bilateral testing and/or challenges on defining suitable scenarios. In practice, we see a willingness from some CSPs and hyperscalers to engage in bilateral testing where practicable but that the tests often do not progress due either to bandwidth and logistics or because of a lack of consensus on plausible scenarios. We believe that plausibility will continue to be a challenging concept for the industry but strongly advise that organisations do not equate plausibility with risk likelihood. Plausible scenarios are defined based on the belief that they could happen, not necessarily the proof that they have. This important nuance is critical and should be made clear to Third Parties to encourage them to stretch the severity parameters of a given scenario, if it is believed that the FSI requires this to prove the ITOL.
In the case of FMIs, bilateral testing is often not possible due to their pervasiveness within the industry. In this case, we advocate that firms work closely with the regulators and trade bodies to understand how market tests can be used or adapted for specific business’ needs. FSIs will be most interested in what proportion of the service can be guaranteed to them during a disruption and for the most part, they should expect FMIs communicate the general principles that they follow, rather than specific metrics (for example, rather than guaranteeing a number of payments that might be processed, they may discuss the principles and parameters of payments prioritisation).
However, FSIs should also give thought to how their own profile within the market conditions how they would be expected to respond to a FMI failure. For example, a Global Systemically Important Bank will be expected to manage cascading systemic impacts in a way that a domestic building society is not. Therefore, it is crucial that FSIs with IBSs that meet the threshold conditions for market-wide impact have classified and prioritised these appropriately. They should then determine how FMIs, CSPs and other non-substitutable Third Parties support that sub-set of systemically important services. These priorities need to be articulated to FMIs and systemic providers so that there is transparency over the cascading impact of failure. We can think of this scenario as a kind of multiplier of impact.
In turn, FMIs will need to use this information to influence the prioritisation that they provide back into the market as normal levels of service are resumed. Realistically, it is unlikely that they will be able to definitively apportion volumes (e.g. number of payments processed), though it may be possible to provide ranges and outline the principles and parameters for prioritisation.
To overcome other testing barriers, firms can explore technology solutions and accelerators such as digital twin modelling; digital runbook solutions and advanced technology resilience techniques like chaos engineering. They can also evaluate their own appetite for exercising their right to audit as an additional assurance mechanism and explore pooled audit as an optional efficiency for increasing confidence over resilience practices.
Most of our clients are now considering how they sustain resilience activities in Business As Usual (BAU) now that major compliance deadlines in the UK and EU have passed. Most of our conversations centre on how resilience can be an organisational advantage and are therefore looking to their Third Parties to help them move beyond the minimum regulatory expectations. Almost all of our clients have found that the volume of activity driven both by supervisory requirements and by their own vision for resilience is not sustainable on a ‘traditional’, labour-intensive operating model. Many of our discussions therefore now centre on first simplifying, then automating the resilience programme so that it can enable better quality outcomes but with less manual effort. There is also an increased drive to have business ownership of resilience activities and outcomes.
To achieve this, firms need to work back from the value to the business and establish a set of clear Third Party Resilience Objectives and Key Results (OKRs) that link back to business strategy. Those might include reduced Third Party incident rates, lower procurement costs, innovative solutions from new entrants or startups or better speed to market for Third Party supported products.
All of these outcomes rely on more effective execution of the resilience framework as well as buy-in from the extended enterprise. To do this, firms should give thought to the culture of engagement that they are building with their Third Parties and move beyond transactional approaches to foster genuine partnerships. They should explore opportunities to collaborate with key Third Parties on resilience initiatives e.g. sharing leading practices, conducting joint exercises, and co-developing contingency plans. Firms should communicate the value proposition of resilience to their suppliers, highlighting its benefits in terms of reputation, customer retention, and competitive advantage as well as partnership longevity.
To more accurately demonstrate that outcomes are being achieved, they should define a set of clear and meaningful KRIs and KPIs for resilience and work with suppliers to generate good quality data, ideally in a format that can be ingested into the broader Operational Resilience tool set for analytical purposes. They should also think about how AI and automation might help with acceleration and scale challenges. For example, AI may be used to help support a better understanding of the Third Party’s resilience posture, overriding bias and misunderstanding in the DDQ evaluations performed by assessors. It may also help to streamline workflows and automate notifications (e.g. Third Party incident management) to the broad set of functions and teams involved in delivery.
As part of a broader Operational Resilience data model and tooling strategy, firms should create a common repository for all assets (including Third Parties and individual engagements), rating them for their importance to the enterprise based on their impact if unavailable or lost. In some cases, this might also mean making changes to other systems (e.g. the CMDB may need to be uplifted to include a business reference architecture that also includes Technology Third Party service ownership). They should also have a comprehensive contract/ third party management system that creates traceability of resilience standards and contractual commitments from sourcing to offboarding.
Third Party Resilience is one of the most important outcomes our clients can achieve as part of their resilience activities and it is likely that Boards, ExCos and regulators will continue to focus their attention here for some time.
In our experience, there are five no regret actions that FSIs can undertake when embarking on a Third Party Resilience uplift journey:
In developing and sustaining Third Party Resilience, one word will be key: Collaboration. The more successful organisations that we work with recognise that Third Party Resilience requires a shared vision for resilience on the part of both the service recipient and service provider . They also recognise that Third Party Resilience programmes fail very often due to functional siloes. Having clarity of outcomes, endorsement from the top and reconciliation of taxonomies, methodologies and objectives will be critical for firms seeking to get this right .
For more information about how we can help you with our Third Party Resilience, Operational Resilience, or TPRM services, please reach out to the team.