Organisations rely on systems to operate effectively – to build competitive advantage, maintain operations and avoid adverse outcomes. Indeed, properly functioning systems are essential if organisations are to avoid an otherwise inevitable drift to disorder. This is as true for corporate governance systems as it is for other organisational systems. In this practical article, Partner Jonathan Cheyne from our Board Advisory & Governance group introduces the famous Swiss Cheese Model of incident causation – which is widely applied in many other domains – and highlights the insights the model provides for those interested in building and maintaining strong corporate governance systems and more resilient organisations.
We’ve all experienced it – when someone makes an observation that encapsulates what we felt we knew all along but had yet to (at least so eloquently) articulate, or that would, so we charitably tell ourselves, have been obvious to us had we given the matter any thought. Paradoxically, though, such clarity of thought is often the product of deep reflection, enquiry and perception.
So it is with the so-called Swiss Cheese Model (SCM) of incident causation, formulated by the late James Reason.[1] Well known in the field of safety (notably in the extractive industries, medicine and aviation) from which it originates, many of the key principles underpinning the SCM have applications in other domains, including corporate governance.
The SCM explained
The SCM is a model for analysing for the causes of incidents that illustrates how adverse outcomes (accidents) often arise out of the interaction of multiple elements in an organisational system.
A basic premise of the SCM is that humans are inherently fallible and errors are to be expected. This being so, it is necessary for a system to have adequate defences (barriers or safeguards) that are designed to prevent errors from occurring, or if they do occur, in limiting or localising their impact. If a serious accident or error does occur, this is seen as being primarily a result of systemic factors, i.e., a failure in multiple defences. As one author has put it in an accident context, the site of a major incident can perhaps be more helpfully considered as a spectacle of mismanagement than the fault of a particular individual.[2]
The slices
The SCM visualises an organisation’s defences against errors/accidents/threats as layers of Swiss cheese, each slice representing a different defence. Defences can include policies, procedures, processes, barriers or structures. For example, in a governance context, an organisation’s defences might include governance structures and reporting lines; the experience and expertise of its personnel; a specific corporate culture (although this can also be a hole); its various policies and procedures; its processes for collecting, analysing and reporting information/data; internal and external audit processes; delegation standards; and so on.
The holes
The distinctive holes in each slice of cheese represent the imperfections or gaps in or features of the relevant defence. These holes – whose locations are not necessarily fixed, and which may open or close over time depending on the circumstances – may be systemic in nature (Reason called them “latent conditions”) or arise from active failures.
Latent conditions are inherent features of the system – Reason used the word “pathogen” – that can render the system vulnerable to adverse effects. They arise from the decisions made by those who can influence aspects of the design and operation of the system itself – the board, senior management, procedure writers etc. Latent conditions can introduce vulnerabilities in two ways.
First, latent conditions can create or foster an environment in which adverse events are more likely to occur. An example of this, in a governance context, could be an obsessive senior management focus on sales or revenue growth, with incentive structures aligned with this, and less concern for ensuring that sales are profitable or fit a particular risk profile, which thereby actively or passively encourages both riskier behaviour and reduces the prospect of proper oversight.[3] Other examples could be cost-cutting measures that lead to a loss of experienced staff or pressure to increase productivity which encourages cuts in quality control.
Second, latent conditions can create long-lasting holes or weaknesses in the defences themselves: there are procedures, but they are not entirely fit for purpose; there is a board of directors, but none are independent; there is an internal audit function, but its purview is too restricted; there is a remuneration policy but the incentive structure is flawed and so on. By themselves, these do not result in adverse outcomes, but in the right conditions, they can facilitate unwanted outcomes.
Active failures are the result of a person (or thing’s) interaction with the system and can take a variety of forms: oversight, a lapse in judgement, a mistake, recklessness or a deliberate or wilful violation of procedure or exploitation of a weakness.
So, for example, in the case of Australia’s largest corporate collapse, that of HIH Insurance Limited, overpaying for acquisitions, inadequate provisioning, imprudent overseas expansions and ill-conceived divestments could all be considered as active failures that contributed to the company’s downfall. The systemic factors – the latent conditions – that facilitated these poor decisions included (this is by no means a detailed summary) a lack of strategic thinking or awareness at the board level, poor risk-management systems, a corporate culture characterised by a dominant CEO to whom both the board and senior management were unduly deferential, a lack of understanding of fundamental risks and their measurement, poor governance processes and documentation, conflicts of interest and inadequate board leadership.[4]
Defence in depth
A means by which the system may be strengthened is through layering defences: if there is a failure in one or more defences, most of the time a threat is neutralised or its effects mitigated by another defence in the system. Thus, even if a threat penetrates one or more defensive holes, it will often collide with a solid barrier in another defensive layer. The existence of holes in each defensive layer do not normally lead to negative outcomes.[5] Occasionally, however, the holes in each defence align in such a way that the defences collectively prove inadequate to overcome a threat to the system, resulting in an adverse outcome or event.
Insights
Although the SCM was originally formulated to help explain how accidents in an industrial or medical context can occur, it is, by virtue of its focus on the interaction of elements in an organisational system, of broader application. Underpinning the model are a number of important insights and observations:
- Major adverse outcomes are rarely the result of a single cause. Often, if not in the vast majority of cases, major adverse outcomes result from a series of smaller errors or issues which in isolation may not cause a major problem, but which combine, cascade or compound into something much more significant, sufficient to overwhelm the system defences. As the Paul Kelly and Kev Carmody song, familiar to many Australians goes, “from little things, big things grow.”
- We cannot alter the human condition but we can alter the conditions in which humans work. Given our inherent fallibility as human beings, and thus the expectation that errors/bad behaviour/judgement is to be expected, and the inherent riskiness in many organisational activities, a key consideration is to ensure that systems have multiple defences (have redundancy). So, for example, a risk-management policy should be accompanied by some process(es) for verifying compliance with it. When there is a failure, a key focus of enquiry should be on why the system failed. A myopic focus on “who” failed, or seeking to attribute fault to a singular root cause, may do little to address the systemic risk.
- Systemic issues can often be identified and resolved before an adverse event occurs. It may be hard, or impossible, to predict when an active failure will occur – when a particular person will make an error or poor decision, when someone decides to commit an act of fraud etc. By contrast, many systemic defects can be identified and addressed proactively. This is how high reliability organisations (HROs) – manufacturers of pharmaceuticals, nuclear power plants, air traffic control towers – can consistently operate under very demanding conditions for extremely long periods of time without incident. A fulsome explanation of how HROs do this is beyond the scope of this article, but, in summary, Weick & Sutcliffe, in their book, Managing the Unexpected: Sustained Performance in a Complex World [6] identified five traits that HROs have in common:
- HROs have a preoccupation with failure and continually examine their processes for potential weaknesses, listening for and acting upon, weak signals of system dysfunction;
- HROs recognise that their activities are complex and that complex problems frequently require complex solutions. This manifests in a willingness to constantly challenge existing beliefs and mine data to establish benchmarks and assess performance;
- HROs are cognisant that detailed knowledge of their operations exists not at the top of the organisational hierarchy, but with the people close to daily operations. HROs foster an environment of openness – of psychological safety – in which there is regular and open communication with employees, encouraging them to share concerns, ensuring their reports and suggestions are taken seriously, and providing feedback when information is shared;
- HROs prioritise expertise over authority, cultivate diversity and devolve and distribute decision making authority, rather than concentrating it, and
- HROs complement their anticipatory activities by developing organisational resilience, that is, an ability to detect, respond to and contain inevitable problems.[7]
- The greater the number of holes in the defences, or the larger those holes are, the weaker the defence and the system in general. In other words, to build a more resilient system, the focus should be on improving the defences, by reducing or eliminating systemic weaknesses and by adding additional layers of defence. However, there is a paradox. Adding layers and increasing coupling of elements in a system creates further complexity. This can itself be a latent defect that can render the system as a whole more vulnerable. Resource constraints aside, there is therefore a balance that must be struck in risk reduction efforts and complexity.
Some limitations in the metaphor and concluding observations
The Swiss Cheese Model visualises a threat as having a linear trajectory – that is, that an adverse outcome is the result of a successive series of failures in system defences in response to one or more events or acts. The linearity of the diagram suggests a chain of causation – of events or breaches of defences – connected, in some fashion, in time or place. This, and the suggestion that system or latent defects are capable of being identified and fixed implies that elements in the organisational system are connected in such a way that enables us to predict a result (with varying degrees of confidence) and, based on those predictions, we can prescribe solutions.[8]
However, organisations are complex systems. They comprise a variety of separate systems, all interacting with one another, with feedback loops and similar connections to the outside world. Predicting the outcome of interactions in complex systems can be extremely difficult, if not impossible.[9] Thus, at the time (rather than with the benefit and bias of hindsight) threats may not be perceptible, their impact unpredictable and therefore the means by which they can be mitigated potentially indeterminable.[10]
A further issue is the difference between the system as designed, and the system in practice. The way a system – say a governance and risk-management system – is described on paper, and how it actually operates in practice, are often very different. At the end of the day, it is the system as it operates that needs to be understood (as far as this is possible) and optimised.
Effectively understanding what is not working and why, and learning from “near misses”, is exceptionally difficult. Organisational resilience requires an ability to anticipate, respond, learn and adapt to unforeseen events and circumstances.[11] This is a topic to which we will return in a subsequent article.
Despite these limitations, the Swiss Cheese Model is a useful tool for conceptualising and thinking about both the components of a governance (or other organisational) system and evaluating failures and near misses.
[1] Reason, J. (2000). Human error: models and management (2000). British Medical Journal, [online] 320(7237), pp. 768-770 doi: https://doi.org/10.1136/bmj.320.7237.768.
[2] Kudritzki J. and Corning, A. (2015). Examining and Learning from Complex Systems Failures. [online] Uptime Institute Blog. Available at: https://journal.uptimeinstitute.com/examing-and-learning-from-complex-systems-failures/.
[3] See, for example, the case study of the collapse of the Washington Mutual Bank, whose collapse in 2008 constituted the largest bank collapse in US history in Weick, K.E and Sutcliffe, K.M (2015). Managing the Unexpected, Sustained Performance in a Complex World (Third Edition)”, Hoboken, New Jersey: Wiley.
[4] Australia. HIH Royal Commission & Owen, N.J. 2003, The failure of HIH Insurance, Commonwealth of Australia, Canberra viewed 18 October 2024 http://nla.gov.au/nla.obj-21341685681.
[5] See also Cook, R. (2002). How complex systems fail. [online] Available at: https: www.researchgate.net/publication/228797158_How_complex_systems_fail.
[6] See note 3.
[7] For a very interesting overview of some of the key concepts of “Resilience Engineering”, see: Geraghty, T. (2020). Resilience Engineering and DevOps – A Deeper Dive. [online] Tom Geraghty Blog. Available at: https://tomgeraghty.co.uk/index.php/resilience-engineering-and-devops/.
[8] Carlisle, Y. and McMillan, E. (2002). Thinking differently about strategy: comparing paradigms. [online] Available at: https://oro.open.ac.uk/7499/1/Thinking_Differently_Final.pdf.
[9] Snowden, D. and Boone, M. (2007). A Leader’s Framework for Decision Making. [online] Harvard Business Review. Available at: https://hbr.org/2007/11/a-leaders-framework-for-decision-making.
[10] For example, Bishr Tabbaa argues that the complexity of computer hardware and software systems has exceed our current understanding of how these systems work and fail: Tabbaa, B. (2020). System Failure: How complexity and convexity make the World fragile [online] Medium. Available at: https://medium.com/dataseries/complex-system-failure-the-whole-is-more-than-the-sum-of-its-parts-ac1ee9bc4e6c.
[11] Geraghty, T. (2020) Resilience Engineering and DevOps – A Deeper Dive. [online] Available at: https://tomgeraghty.co.uk/index.php/resilience-engineering-and-devops/; Woods, D.D. (2018). Resilience is a Verb. [online] Available at: https://www.researchgate.net/publication/329035477_Resilience_is_a_Verb.