In this resource, Alberto G. Alexander puts operational resilience and business continuity in context and provides an overview of the components necessary for an operational resilience management system.
Operational resilience consists of developing the ability of an organization to be flexible in any impactful situation – in includes the ability to alter operations in the face of changing business conditions. An operationally resilient organization is able to recover its critical or important business services from a significant unplanned disruption, while minimizing impact and protecting its customers and the integrity of the operations. The first step in becoming operationally resilient is accepting that disruptive events will occur, and that these events will need to be managed effectively. In this article the operational resilience components will be analyzed; how to build them will be described and indications for best practices will be presented.
Introduction: addressing the differences between business continuity and operational resilience
It seems that there is a confusion between business continuity and operational resilience. As operational resilience seems too often to be used almost interchangeably with business continuity, it’s worth exploring the differences between the two.
Operational resilience refers to: “An organization’s ability to withstand, adapt to, and recover from disruptive events.” (Brookbanks, Tony, 2002) It focuses on the ability to bounce back from disruptions. Its scope is a broad concept that encompasses both business continuity management and other resilience-related activities, such as crisis management, disaster recovery, and cyber resilience. On the other hand, business continuity management (BCM) refers to identifying potential disruptions to an organization’s critical operations and developing plans to ensure continuity. Its focus is on maintaining critical operations during disruptions. BCM’s scope is a subset of operational resilience and focuses on maintaining critical business operations continuity during disruption.
Business continuity management is a vital part of an organization’s planning, but operational resilience is the foundation element that will allow the organization to continue to adapt to a changing environment in the long term. “While BCM comes into play immediately, resilience helps the company to continually change, adapt, and improve, in order to keep pace with an ever-changing eco system,” Pykhova, 2021. BCM is considered a subset of operational resilience.
Components of operational resilience
Operational resilience has certain components that need to be well understood.
Operational resilience allows an organization to keep working amid adversity. It encompasses the entire organization, including operations, finance, cyber security, and compliance. The critical components of operational resilience are depicted in figure one. These components are the main ingredients to support the development of operational resilience in an organization.
Figure one: components of operational resilience. Source: RiskOptics, 2021.
The following presents a brief description of each component.
Having the right people and business processes to govern a company’s strategy is crucial to ensure operational resilience.
Companies should use their existing governance processes to design, monitor, and implement an effective operational resilience strategy that allows the business to respond to, adapt to, recover from, and learn from disruptive events.
Robust governance and controls result in clear roles and responsibilities, visible results and risks for key decision-makers, and well-supported employees and senior management, who are then enabled to fulfill their duties.
Operational risk management
Leverage the company’s operational risk management functions to identify external and internal threats. Risk management frameworks help to identify potential failures in people, processes, and systems on an ongoing basis.
Companies should have: “Sufficient controls and procedures so they can quickly identify and assess threats, vulnerabilities, and overall operational risk,” Ashby, 2021. The ability to take immediate action will prevent problems from affecting the conduct of critical operations.
Risk oversight functions (compliance, legal, IT security, and so forth) should periodically evaluate the effectiveness of the controls and procedures they’ve implemented. These assessments should also be carried out whenever changes are made to critical operations. Lessons learned should also be conducted after incidents to identify the root causes of incidents and to eliminate the risk of recurrence.
Business continuity planning and testing
Companies should have business continuity plans and drill with BC exercises to assess their capacity to carry out vital activities and disaster recovery in various unusual but plausible situations.
An effective business continuity plan must be forward-looking in assessing the impact of potential disruptions. Business continuity drills should be conducted to validate system access and connectivity for various possible disruptive events.
These plans should be comprehensive, to incorporate business impact analyses and recovery strategies. Testing and training foster employee preparedness. The communication plans and crisis management teams should be prepared ahead of time, to assure that the company has the infrastructure for quick response.
Once critical operations have been identified, the internal and external interconnections and interdependencies required to perform those essential functions need to be mapped.
“The map identifies and documents the people, technology, processes, information, facilities, and relationships necessary to carry out the company’s critical operations,” Girling, 2022. It is important not to forget about third parties or intra-group arrangements and their relationships to mission-critical functions in the organization.
The map developed should be specific enough to identify vulnerabilities and complexities in critical operations that could be at risk in the event of a disruption.
A periodic self-assessment allows business leaders to identify and evaluate risks and their associated controls collectively. Risk assessments add value by increasing an operating unit’s involvement in designing risk control systems, identifying risk exposures, and determining corrective actions.
The components of operational resilience need to be developed and monitored.
Building operational resilience
The following providesa description of the steps needed to build operational resilience in a particular organization.
Is important to understand that operational resilience includes the processes, systems, characteristics, and techniques that business utilize to recover from adverse events. “Whether it is a natural disaster or complex compliance requirements, operational resilience enables the business to overcome unexpected challenges and thrive,” Ashby, 2021. To help the business prepare for and prevent disruptions, six steps for building operational resilience in any type of organization are required:
Identify key business services
The entire organization needs to be mapped. Here the different organizational elements need to be identified and mapped: people, objectives, processes, technology, facilities, and other resources. The map needs to be aligned with the risk appetite to find the organizational resilience and tolerance level. The map gives a better understanding of the services.
Mapping the organization can also show the critical insights as to which services would cause greater damage if disrupted. Moreover, the map can help identify the processes, systems, people, and related third parties dependent on the organization’s services. This process can determine which resources would be critical in ensuring continuous service delivery.
Set impact tolerances
A maximum tolerable level of disruption (MTLD) needs to be created for each of the organization’s services for scenario testing and risk analysis. “The MTLD refers to the maximum tolerable level of disruption to an important business service, including the maximum tolerable duration of a disruption,” Chapelle, 2019. By setting impact tolerances, organizations are able to determine the point at which intolerable harm occurs to customers (or in the case of financial sector firms when a risk is posed to the orderly functioning of the wider financial markets). Thus, by anticipating scenarios in which harm may occur, organizations can operate within their impact tolerances. This approach is premised on the idea that setting impact tolerances helps boards and senior management prepare for inevitable disruptions regardless of their likelihood.
Perform risk assessments
Risk assessments need to be performed on all the organizational components that have an impact in the delivery of the key product/services. All the possible risks that could create a business disruption need to be identified and actions for mitigation implemented.
Vulnerabilities are those organizational conditions or weaknesses that can be exploited by a risk and cause a business disruption.
The most important vulnerabilities of the different components that contribute to the delivery of the products/services need to be identified and actions taken to mitigate or remove the vulnerability.
Build in flexibility
Ultimate flexibility means having viable alternatives in any situation. Standardization of parts, processes, and production systems, so that these elements are interchangeable, creates options for using them where there is a shortfall. In the event of a disruption, organizations can substitute alternative parts, swap out damaged components, use alternative processes, or reroute the flow of business activities.
Risk scenarios provide valuable data to identify possible points of failure and validate the MTLD identified. Analyzing scenarios can determine the weak links across the organization.
Creating risk scenarios helps the organization to determine the extent of the impacts a disruption can cause and the possible outcomes. When aligned with the business objectives, risk scenarios can also help understand the role of stakeholders in mitigating risks.
Focusing on the learning experiences the organization can get from disruptions provides key information to increase adaptability. Learning from disruptive events gives critical insights and outcomes that enable the organization to proactively prepare for and prevent disruptions.
Best practices to strengthen operational resilience
The practices outlined below provide an approach that organizations can use to strengthen and maintain their operational resilience.
Effective governance helps ensure that organizations not only operate in a safe and sound manner and comply with applicable laws and regulations, but also maintain operational resilience. In keeping with existing regulations and guidance, the practices outlined below promote effective governance:
- The organization’s board of directors works with senior management to confirm that operational resilience practices are led and staffed by individuals with relevant expertise, approve appropriate budgets and resources, and promote a culture of effective risk management.
- The organization’s board of directors oversees the organization’s management of operational risk in its business line operations, its independent operational risk management function, and its independent internal (or external) audit function.
- Senior management is accountable for developing, implementing, and managing effective and resilient information systems and controls, as appropriate, to maintain critical operations and core business lines consistent with the organization’s tolerance for disruption.
Operational risk management
By identifying, managing, and mitigating operational risk exposures related to internal processes, people, systems, external threats, and third parties, an organization is able to strengthen its operational resilience. Effective operational risk management involves close engagement by the organization’s senior management, business line operations, independent operational risk management function, and independent internal (or external) audit function. The practices outlined below promote effective operational risk management:
- The organization’s senior management oversees the implementation of operational risk management processes, systems, and controls to identify and contain the scope of a disruption, mitigate its effects, and resolve the disruption consistent with the organization’s tolerance for disruption.
- The organization’s business line operations management identifies and mitigates operational risk exposures in alignment with the organization’s tolerance for disruption.
- The organization’s operational risk management function works closely with its business continuity management and recovery or resolution planning functions with respect to operational resilience efforts.
Business continuity management
Business continuity plans consider market and enterprise-wide stresses and idiosyncratic risks that can imperil the continuity of an organization’s critical operations and core business lines or otherwise have a broader impact on the financial system.
The practices outlined below promote strong business continuity management:
- The organization’s business continuity management incorporates business analysis, testing, training and awareness programs, as well as communication and crisis management policies.
- The organization periodically reviews its business continuity plans to ensure contingency strategies remain consistent with current operations, risks and threats, its tolerance for disruption, and recovery priorities.
- The organization tests business continuity plans, reviews the execution of tests, and improves plans by incorporating lessons learned.
Third party risk management
Recognition of third-party risk is vital to operational resilience, especially if outsourcing arrangements involve entities that perform critical operations or core business activities. In keeping with existing regulations and guidance, the practices outlined below promote management of third-party risk:
- The organization identifies and analyzes third-party risk of critical operations and core business lines. It prioritizes third-party dependencies that are most significant to the organization and understands, manages, and mitigates its risks.
- The organization establishes relationships with third parties through formal agreements. The organization manages and monitors the performance of third parties against its service requirements and its tolerance for disruption.
- The organization periodically reviews reports of systems and controls and summaries of test results or other equivalent assessments of third parties. It establishes processes and benchmarks for monitoring a third party’s ability to continue to deliver services during disruptions.
Scenario analysis helps an organization to develop, validate, and calibrate an organization’s tolerance for disruption. Organizations may integrate the analysis with disaster recovery and business continuity management for use in assessing operational resilience. In keeping with existing regulations and guidance, the practices outlined below promote effective scenario analysis:
- Operational risks identified by the organization’s operational risk management function, independent internal (or external) audit function, business continuity management, and recovery or resolution planning activities should be incorporated, as applicable, into severe but plausible scenarios affecting the organization’s critical operations and core business lines. The organization designs the scenarios so that they may be used to test the organization’s tolerance for disruption.
- The organization maintains a robust governance framework and independent review function to oversee the integrity and consistency of the scenario development process.
- In designing scenarios, the organization leverages both the mapped interconnections and interdependencies of its critical operations and core business lines including its third-party risks, set forth in its recovery or resolution plans, as well as relevant business impact analyses.
- The organization uses scenario analysis to back-test against past instances of severe disruptions that have arisen from various disruptions. The results of back-testing are used to refine scenarios and increase their effectiveness for future.
Secure and resilient information system management
Secure and resilient information systems underpin the operational resilience of an organization’s critical operations and core business lines. The appropriate implementation, use, and protection of information systems can help an organization identify and detect risks to operational resilience.
The practices outlined below promote secure and resilient information systems:
- Information systems, including elements that depend on third parties, supporting the organization’s critical operations and core business lines are subject to robust risk identification, protection, detection, and response and recovery programs that are regularly tested. Information systems incorporate appropriate situational awareness and provide management with relevant information on a timely basis.
- The organization routinely applies and evaluates the effectiveness of processes and controls to protect the confidentiality, integrity, availability, and overall security of the organization’s data and information systems.
- The organization establishes controls to safeguard the integrity and availability of critical data against the impact of destructive malware, including ransomware, or other similar threats.
Surveillance and reporting
Operational resilience entails ongoing surveillance and reporting of operational risks and dissemination of that information to the board of directors and relevant stakeholders across the organization. In keeping with existing regulations and guidance, the practices outlined below promote this:
- The organization identifies and monitors ongoing exposure to operational risk relative to its risk appetite and tolerance for disruption. The organization establishes and maintains appropriate communication and coordination procedures to inform all relevant areas of the organization’s ongoing exposures.
- The organization detects in a timely manner anomalous activity that could lead to a disruption affecting the organization’s critical operations and core business lines, and it assesses the potential impact of the activity together with the effectiveness of protective measures.
- The organization conducts continuous surveillance and reporting to senior management and the board of directors that provides sufficient data and information for timely and appropriate decisions regarding measures to respond to a disruption.
Operational resilience focuses on building resilience through leadership, culture, and preparedness so that the business is ready to handle changes at all levels. The organization’s board members and top management are responsible for developing a strategic vision that encompasses the broader business context and how often external factors may impact specific business processes. From there they will continue to lead the process of identifying the components of operational resilience and the actions required for building it.
Operational resilience is closely related but not identical to BCM. BCM’s scope is a subset of operational resilience and focuses on maintaining critical business operations continuity during disruption. Operational resilience uses a larger scope that incorporates external factors and the ways that business processes are tied together. An organization that is trying to become operationally resilient might look at how events indirectly related to the business could cause a chain of other events that eventually disrupts operations.
Operational resilience includes, but is more than, business recovery; it is a change in mindset, culture, and approach that drives the implementation of resilient measures and practices throughout the business. Building a resilient organization requires a shift from a reactive to a proactive posture, and resilience must be built into the very fabric of the organization – from its culture to how the company operates both internally and across the extended third-party ecosystem.
The shift from recovery to resilience requires a cultural change starting at the top. This cannot only be the responsibility of a small team – it requires focus throughout the entire organization.
All the strategic levels in the organization have to play a leading role in the organizational development of the components of operational resilience, the steps for building it, and constantly implementing best practices.
- Brookbanks, Mike, Gandy, Tony. Operational Resilience: The Art of Risk Management, IBM, 2002.
- Pykhova, Elena. Operational Risk Management in Financial Services: A practical Guide to Establishing Effective Solutions, Kogan Page Limited, 2021.
- Chapelle, Ariane. Operational Risk Management, Wiley Series, 2019.
- Ashby, Simon. Fundamentals of Operational Risk Management: Understanding and Implementing Effective Tools, Policies and Frameworks, Kogan Page Limited, 2022.
- Girling, Phillippa. Operational Risk Management: A Complete Guide for Banking and Fintech 2nd Edition, Wiley, 2022.
Dr. Alberto G. Alexander holds a Ph.D from The University of Kansas and a M.A. from Northern Michigan University. He is a MBCI, BCMS IRCA Lead Auditor and Approved Tutor. He is the managing director of the international consulting and managerial training firm: Eficiencia Gerencial y Productividad SAC, located in Lima, Peru. He can be contacted at: email@example.com He is currently Professor at the Graduate Business School of ESAN University, Lima, Peru.