Welcome to the next article in our three-part series on Third Party Resilience, ‘Stronger Together’. In the last article, we explored a definition for Third Party Resilience and discussed how building resilience by design concepts into the framework will be key for firms seeking to enhance their approach to this important topic. We considered four key principles of resilience by design and outlined important factors for successful execution such as alignment of taxonomies and terminology as well as how resilience has shifted from plans and playbooks to designing and deploying fault tolerant services. In this article, we consider how to apply the framework and emphasise the critical importance of taking a proportionate but scientific approach when evaluating resilience at source to contract phases. We also consider how training and education is the sometimes overlooked but essential part of getting all of this right.
Determining the scope of the Third Party Resilience population is one challenge; understanding how to vary the Third Party Resilience framework to best treat that population is another. Most organisations do not have Operational Resilience and TPRM teams that are sized to treat all of these service providers equally (nor would we advocate that they do so), so it is critical that they consider how treatment can be proportionate and risk-sensitive. To some extent, firms will want to have resilience standards for all of the Third Parties that they work with but those that support Important or Material business services will require focussed attention and the greater proportion of a firm’s resources.
We recognise five key variables that influence how the Third Party Resilience framework is practically applied and where to prioritise time, effort and attention. These include:
These factors independently and together should inform the level of preliminary assurance as well as ongoing oversight applied to the Third Party service population. Oversight and assurance is not a one-time or periodic endeavour but a continuous process requiring the embedding of resilience expectations into the TPRM lifecycle.
Across the industry, we see less time spent on resilience in the source to contract phases than is optimal. Firms should give thought to how they can be more proportionate but also more scientific as they frame the IRQ and DDQ. This will sometimes require them to reconcile differing methodologies (for example, determining inherent and residual risk is difficult for resilience which is indifferent to likelihood and at this early stage in a procurement cycle does not necessarily factor in mitigations and contingencies). Firms should consider how they gradually build knowledge of a supplier’s resilience through these phases and also reflect on whether they will be amplifying over-reliance in a way that may be outside risk appetite. Performing enhanced Due Diligence on Third Parties where service disruption has been observed in the industry, declared as part of a DDQ submission, or where there are other heightened risk factors, is crucial. This may include tabletop walkthroughs of controls and procedures even before contracting.
In post-contract phases, it is insufficient to oversee Third Party Resilience solely through contractual provisions and ongoing monitoring and management. Firms need to commit to utilising horizon scanning capabilities to supplement their understanding of the overall operational and financial health of the service. They also need to commit to a programme of testing and exercising with third parties on an annual basis or on a rolling testing cycle to determine whether contractual commitments hold up in practice. Periodic testing and exercising helps to build the mutual confidence that organisations can absorb and adapt following disruptive events. It also helps to elucidate roles, responsibilities and plans and also demonstrates whether recovery objectives and impact tolerances can be met. We commonly see that there are misunderstandings around Service Level Agreements that are only identified in the course of testing. For example, Recovery Time Objective SLAs do need to support the ITOL and that means understanding the recovery time actual (from acknowledging, triaging and classifying an incident through to user restoration). Bilateral testing where feasible provides an opportunity to confirm the resilience of the service; stress contingencies and probe what minimum service level the third party might be able to offer during severe or prolonged disruption.
Establishing the appropriate contractual mechanisms to support joint Third Party stress-testing is advised for new engagements but, for existing ones, firms should look to partner with suppliers to build amenability to participate. Where amenability of third parties is a challenge, they should consider what the barriers are and identify practical solutions. This might include simulating the Third Party’s response and subsequently validating this with them; virtualising the stress-testing process through digital collaboration spaces; or running tests over multiple smaller sessions rather than a long, mass participation one.
For firms moving towards digitisation of testing, simulating impacts and the effectiveness of internal contingencies using historical data and digital twin stress-testing techniques may help to strengthen understanding of prior events and the effectiveness of contingencies. Firms may also wish to explore whether community groups and industry forums can support with pooled testing of specific services within their individual sectors to find logical points of commonality as well as utilising the results of pooled audits to support their understanding. Where the firm has a high degree of reliance on confirming a Third Party’s resilience through bilateral testing, but find that amenability is lacking, this should be documented within the firm’s self-assessment to provide transparency to the regulators over the limitations (and what mechanisms such as simulations have been used to partially mitigate the gap).
It is essential that Third Party Resilience operates from a clear service catalogue and RACI with a defined ownership structure and operating model. As part of this, firms will need to clarify where Third Party Resilience should sit and, since this is often between multiple teams, having activity level ownership and accountability defined is critical to its success. Establishing a Third Party Resilience Governance framework to ensure alignment and between teams and making this widely accessible within the business is important and will help to provide clarity to for key roles driving the framework. This should not necessarily require the establishment of new governance forums but rather consideration as to how best to use existing TPRM and Operational Resilience forums without duplication.
In practice, most organisations have an engagement or service manager with single accountability for a specific third party service. These individuals should typically sit within the business and have a good working knowledge of the way in which the service is going to be used within the context of the business process that it supports. This means that they will be well-placed to drive resilience activities such as continuity and contingency planning for the loss or temporary failure of the service.
It is key that these individuals are therefore empowered and trained in the foundations of Third Party Resilience and have supporting reference materials to help them to navigate the engagement, using the service review cycle to best effect. Equipping engagement managers with crib sheets and aide memoires can help them to use the monthly service review meeting to understand any material control changes that might lead to disruption, rather than the more perfunctory use of those meetings (e.g. discussing service credits) that we sometimes see. We find that documenting a consumable playbook for Third Party engagement managers is an effective way to help them understand the regulatory requirements and expectations from sourcing and selection through to exit and disengagement. Similarly, creating playbooks for the prolonged disruption of suppliers can be a useful artefact for business service owners and executives navigating Third Party scenarios where ITOLs are exceeded.
Training should also be provided to other roles such as Important Business Service Owners (or their delegates); IT Service and Product Owners; and Procurement and Sourcing team members. In making funding requests for Third Party Resilience programmes, firms should anticipate the costs of embedding the framework through training and education and also factor in time and costs associated with running town hall style events with their Third Parties to educate them on any changes or uplift to the firm’s own resilience standards.
Stay tuned! We know that many of our clients will find that the recommendations in this article ring true up to a point but become more challenging to apply when it comes to certain categories of Third Parties, such as FMIs. In our next article, we will explore how we need to adapt the framework for different Third Party types and how to sustain Third Party Resilience once the key components are in place. We will also outline 5 no regret actions that all FSIs should take when embarking on a Third Party Resilience journey.