Risk Analysis A Quantitative Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 729

DownloadRisk-Analysis-A-Quantitative-Guide
Open PDF In BrowserView PDF
Risk Analysis
A quantitative guide

David Vose

third edition

John Wiley & Sons, Ltd

Copyright O 2008

David Vose

Published by

John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester,
West Sussex, PO19 8SQ, England
Telephone +44 (0)1243 779777

Email (for orders and customer service enquiries): cs-books@wiley.co.uk
Visit our Home Page on www.wileyeurope.com or www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright,
Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham
Court Road, London WIT 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be
addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19
8SQ, England, or emailed to permreq@wiley.co.uk, or faxed to +44 (0)1243 770620.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product
names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The
Publisher is not associated with any product or vendor mentioned in this book.
This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold
on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert
assistance is required, the services of a competent professional should be sought.
Other Wiley Editorial Ofices
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809
John Wiley & Sons Canada Ltd, 6045 Freemont Blvd, Mississauga, Ontario, Canada, L5R 453
Wiley also publishes its books in a variety of electronic formats. Some content that appears
in print may not be available in electronic books.

Library of Congress Cataloging-in-Publication Data
Vose, David.
Risk analysis : a quantitative guide / David Vose. - 3rd ed.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-51284-5 (cloth : alk. paper)
1. Monte Carlo method. 2. Risk assessment-Mathematical models. I.
Title.
QA298.V67 2008
658.4'0352 - dc22

British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library

ISBN: 978-0-470-51284-5 (H/B)
Typeset in 10/12pt Times by Laserwords Private Limited, Chennai, India
Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire

Contents
Preface
Part 1 Introduction
1

Why
1.1
1.2
1.3
1.4
1.5
1.6

do a risk analysis?
Moving on from "What If" Scenarios
The Risk Analysis Process
Risk Management Options
Evaluating Risk Management Options
Inefficiencies in Transferring Risks to Others
Risk Registers

2

Planning a risk analysis
2.1
Questions and Motives
Determine the Assumptions that are Acceptable or Required
2.2
2.3 Time and Timing
You'll Need a Good Risk Analyst or Team
2.4

3

The quality of a risk analysis
The Reasons Why a Risk Analysis can be Terrible
3.1
Communicating the Quality of Data Used in a Risk Analysis
3.2
3.3 Level of Criticality
3.4 The Biggest Uncertainty in a Risk Analysis
3.5 Iterate

4

Choice of model structure
Software Tools and the Models they Build
4.1
4.2
Calculation Methods
4.3
Uncertainty and Variability
How Monte Carlo Simulation Works
4.4
4.5
Simulation Modelling

5

Understanding and using the results of a risk analysis
5.1 Writing a Risk Analysis Report
5.2 Explaining a Model's Assumptions

viii

Contents

5.3
5.4

Graphical Presentation of a Model's Results
Statistical Methods of Analysing Results

Part 2 Introduction
6

Probability mathematics and simulation
6.1 Probability Distribution Equations
6.2
The Definition of "Probability"
6.3
Probability Rules
6.4
Statistical Measures

7

Building and running a model
7.1
Model Design and Scope
Building Models that are Easy to Check and Modify
7.2
7.3 Building Models that are Efficient
7.4 Most Common Modelling Errors

8

Some basic random processes
8.1 Introduction
8.2 The Binomial Process
8.3 The Poisson Process
8.4 The Hypergeometric Process
8.5 Central Limit Theorem
8.6 Renewal Processes
8.7 Mixture Distributions
8.8 Martingales
8.9 Miscellaneous Examples

9

Data
9.1
9.2
9.3
9.4
9.5
9.6

10

Fitting distributions to data
10.1 Analysing the Properties of the Observed Data
10.2 Fitting a Non-Parametric Distribution to the Observed Data
10.3 Fitting a First-Order Parametric Distribution to Observed Data
10.4 Fitting a Second-Order Parametric Distribution to Observed Data

11

Sums of random variables
11.1 The Basic Problem
11.2 Aggregate Distributions

and statistics
Classical Statistics
Bayesian Inference
The Bootstrap
Maximum Entropy Principle
Which Technique Should You Use?
Adding uncertainty in Simple Linear Least-Squares Regression Analysis

contents

12

Forecasting with uncertainty
12.1 The properties of a Time Sefies Forecast
12.2 Common Financial Time Sefies Models
12.3 ~utoregressiveModels
12.4 Markov Chain Models
12.5 Birth and Death Models
1 ~i~~ Series projection of Events Occumng Randomly in Time
12.7 Time Series Models with Leading 1ndifators
12 8 Comparing Forecasting Fits for Different Models
12.9 ~ong-TermForecasting

13

Modelling correlation and dependencies
13.1 Introduction
13.2 Rank Order Conelation
13.3 Copulas
13.4 The Envelope Method
13.5 Multiple Correlation Using a Look-UP Table

14

Eliciting from expert opinion
14.1 Introduction
14.2 Sources of Enor in Subjective Estimation
14.3 Modelling Techniques
14.4 Calibrating Subject Matter Experts
14.5 Conducting a Brainstorming Session
14.6 Conducting the Interview

15

16

i

I

17

IX

321
322

Testing and modelling causal
15.1 Compylobaeter Example
15.2 Types of Model to Analy se Data
15.3 From Risk Factors to Causes
15.4 Evaluating Evidence
15.5 The Limits of Causal Arguments
15.6 An Example of a Qualitative Causal Analysis
15.7 is Causal Analysis Essential?
optimisation in risk analysis
16.1 Introduction
16.2 timi mi sat ion Methods
16.3 Risk Analysis Modelling and Optimisation
16.4 Working Example: Optimal Allocation of Mineral Pots
Checking and validating a model
17.1 Spreadsheet Model Errors
17.2 Checking Model Behaviour
17.3 Comparing predictions Against Reality

436
439
444

451
45 1
456
460

x

Contents

18

Discounted cashflow modelling
18.1 Useful Time Series Models of Sales and Market Size
18.2 Summing Random Variables
18.3 Summing Variable Margins on Variable Revenues
18.4 Financial Measures in Risk Analysis

19

Project risk analysis
19.1 Cost Risk Analysis
19.2 Schedule Risk Analysis
19.3 Portfolios of risks
19.4 Cascading Risks

20

Insurance and finance risk analysis modelling
20.1 Operational Risk Modelling
20.2 Credit Risk
20.3 Credit Ratings and Markov Chain Models
20.4 Other Areas of Financial Risk
20.5 Measures of Risk
20.6 Term Life Insurance
20.7 Accident Insurance
20.8 Modelling a Correlated Insurance Portfolio
20.9 Modelling Extremes
20.10 Premium Calculations

21

Microbial food safety risk assessment
21.1 Growth and Attenuation Models
21.2 Dose-Response Models
21.3 Is Monte Carlo Simulation the Right Approach?
21.4 Some Model Simplifications

22

Animal import risk assessment
22.1 Testing for an Infected Animal
22.2 Estimating True Prevalence in a Population
22.3 Importing Problems
22.4 Confidence of Detecting an Infected Group
22.5 Miscellaneous Animal Health and Food Safety Problems

I

Guide for lecturers

I1

About ModelRisk

111 A compendium of distributions
111.1 Discrete and Continuous Distributions
111.2 Bounded and Unbounded Distributions
111.3 Parametric and Non-Parametric Distributions
111.4 Univariate and Multivariate Distributions

Contents

111.5 Lists of Applications and the Most Useful Distributions
111.6 How to Read Probability Distribution Equations
111.7 The Distributions
111.8 Introduction to Creating Your Own Distributions
111.9 Approximation of One Distribution with Another
111.10 Recursive Formulae for Discrete Distributions
111.11 A Visual Observation On The Behaviour Of Distributions

IV

Further reading

V

Vose Consulting

References
Index

xi

Preface
I'll try to keep it short.
This third edition is an almost complete rewrite. I have thrown out anything from the second edition
that was really of pure academic interest - but that wasn't very much, and I had a lot of new topics I
wanted to include, so this edition is quite a bit bigger. I apologise if you had to pay postage.
There are two main reasons why there is so much material to add since 2000. The first is that our
consultancy firm has grown considerably, and, with the extra staff and talent, we have had the privilege
of working on more ambitious and varied projects. We have particularly expanded in the insurance
and finance markets, so you will see that a lot of techniques from those areas, which have far wider
applications, appear throughout this edition. We have had contracts where we were given carte blanche
to think up new ideas, and that really got the creative juices flowing. I have also been involved in writing
and editing various risk analysis guidelines that made me think more about the disconnect between what
risk analysts produce and what risk managers need. This edition is split into two parts in an attempt to
help remedy that problem.
The second reason is that we have built a really great software team, and the freedom to design our
own tools has been a double espresso for our collective imagination. We now build a lot of bespoke risk
analysis applications for clients and have our own commercial software products. It has been enormous
fun starting off with a typical risk-based problem, researching techniques that would solve that problem
if only they were easy to use and then working out how to make that happen. ModelRisk is the result,
and we have a few others in the pipeline.

Some thank yous . . .
I have imposed a lot on Veerle and our children to get this book done. V has spent plenty of evenings
without me while I typed away in my office, but I think she suffered much more living with a guy who
was perpetually distracted by what he was going to write next. Sophie and SCbastien have also missed
out. Papa always seemed to be working instead of playing with them. Worse, perhaps, it didn't stop
raining all summer in Belgium, and they had to forego a holiday in the sun so I could finish writing.
I'll make it up to all three of you, I promise.
I have the luxury of having some really smart and motivated people working with me. I have leaned
rather heavily on the partners and staff in our consultancy firm while I focused on this book, particularly
on Huybert Groenendaal who has largely run the company in my "absence". He also wrote Appendix 5.
Timour Koupeev heads our programming team and has been infinitely patient in converting my neverending ideas for our ModelRisk software into reality. He also wrote Appendix 2. Murat Tomaev, our
head programmer, has made it all work together. Getting new modules for me to look at always feels
a little like Christmas.

xiv

R~skAnalysis

My secretary, Jane Pooley, retired from the company this year. She was the first person with enough
faith to risk working for me, and I couldn't have wished for a better start.
Wouter Smet and Michael van Hauwermeiren in our Belgian office have been a great support, going
through the manuscript and models for this book. Michael wrote the enormous Appendix 3, which could
be a book in its own right, and Wouter offered many suggestions for improving the English, which is
embarrassing considering it's his third language.
Francisco Zagmutt wrote Chapter 16 while under pressure to finish his thesis for his second doctorate
and being a full-time, jumping-on-airplanes, deadline-chasing senior consultant in our US office.
When Wiley sent me copies of the first edition, the first thing I did was go over to my parents' house
and give them a copy. I did the same with the second edition, and the Japanese version too. They are
all proudly displayed in the sitting room. I will be doing the same with this book. There's little that can
beat knowing my parents are proud of me, as I am of them. Mum still plays tennis, rides and competes
in target shooting. Dad is still a great golfer and neither ever seems to stop working on their house,
unless they're off to a party. They are a constant reminder to make the most of life.
Paul Curtis copy-edited the manuscript with great diligence and diplomacy. I'd love to know how he
spotted inconsistencies and repetitions in parts of the text that were a hundred or more pages apart. Any
remaining errors are all my fault.
Finally, have you ever watched those TV programmes where some guy with a long beard is teaching
you how to paint in thirty minutes? I did once. He didn't have a landscape in front of him, so he just
started painting what he felt like: a lake, then some hills, the sky, trees. He built up his painting, and
after about 20 minutes I thought - yes, that's finished. Then he added reflections, some snow, a bush
or two in the foreground. Each time I thought - yes, now it's finished. That's the problem with writing
a book (or software) - there's always something more to add or change or rewrite. So I have rather
exceeded my deadline, and certainly the page estimate, and my thanks go to my editor at Wiley, Emma
Cooper, for her gentle pushing, encouragement and flexibility.

Part I
ction
The first part of this book is focused on helping those who have to make decisions in the face of risk.
The second part of the book focuses on modelling techniques and has all the mathematics. The purpose
of Part 1 is to help a manager understand what a risk analysis is and how it can help in decision-making.
I offer some thoughts on how to build a risk analysis team, how to evaluate the quality of the analysis
and how to ask the right questions so you get the most useful answers.
This section should also be of use to analysts because they need to understand the managers' viewpoint
and work towards the same goal.

Chapter I

Why do a risk analysis?
In business and government one faces having to make decisions all the time where the outcome is
uncertain. Understanding the uncertainty can help us make a much better decision. Imagine that you
are a national healthcare provider considering which of two vaccines to purchase. The two vaccines
have the same reported level of efficacy (67 %), but further study reveals that there is a difference in
confidence attached to these two performance measures: one is twice as uncertain as the other (see
Figure 1.1).
All else being equal, the healthcare provider would purchase the vaccine with the smallest uncertainty
about its performance (vaccine A). Replace vaccine with investment and efficacy with profit and we
have a problem in business, for which the answer is the same - pick the investment with the smallest
uncertainty, all else being equal (investment A). The principal problem is determining that uncertainty,
which is the central focus of this book.
We can think of two forms of uncertainty that we have to deal with in risk analysis. The first is a
general sense that the quantity we are trying to estimate has some uncertainty attached to it. This is
usually described by a distribution like the ones in Figure 1.1. Then we have risk events, which are
random events that may or may not occur and for which there is some impact of interest to us. We can
distinguish between two types of event:
A risk is a random event that may possibly occur and, if it did occur, would have a negative
impact on the goals of the organisation. Thus, a risk is composed of three elements: the scenario;
its probability of occurrence; and the size of its impact if it did occur (either a fixed value or a
distribution).
An opportunity is also a random event that may possibly occur but, if it did occur, would have a
positive impact on the goals of the organisation. Thus, an opportunity is composed of the same three
elements as a risk.
A risk and an opportunity can be considered the opposite sides of the same coin. It is usually easiest
to consider a potential event to be a risk if it would have a negative impact and its probability is less
than 50%, and, if the risk has a probability in excess of 50 %, to include it in a base plan and then
consider the opportunity of it not occurring.

I. I Moving on from "What If" Scenarios
Single-point or deterministic modelling involves using a single "best-guess" estimate of each variable
within a model to determine the model's outcome(s). Sensitivities are then performed on the model to
determine how much that outcome might in reality vary from the model outcome. This is achieved by
selecting various combinations for each input variable. These various combinations of possible values

4

Risk Analysis

Figure 1.1 Efficacy comparison for two vaccines: the vertical axis represents how confident we are about
the true level of efficacy. I've omitted the scale to avoid some confusion at this stage (see Section 111.1.2).

around the "best guess" are commonly known as "what if" scenarios. The model is often also "stressed
by putting in values that represent worst-case scenarios.
Consider a simple problem that is just the sum of five cost items. We can use the three points,
minimum, best guess and maximum, as values to use in a "what if" analysis. Since there are five cost
items and three values per item, there are 35 = 243 possible "what if7' combinations we could produce.
Clearly, this is too large a set of scenarios to have any practical use. This process suffers from two
other important drawbacks: only three values are being used for each variable, where they could, in
fact, take any number of values; and no recognition is being given to the fact that the best-guess value
is much more likely to occur than the minimum and maximum values. We can stress the model by
adding up the minimum costs to find the best-case scenario, and add up the maximum costs to get the
worst-case scenario, but in doing so the range is usually unrealistically large and offers no real insight.
The exception is when the worst-case scenario is still acceptable.
Quantitative risk analysis (QRA) using Monte Carlo simulation (the dominant modelling technique in
this book) is similar to "what if" scenarios in that it generates a number of possible scenarios. However,
it goes one step further by effectively accounting for every possible value that each variable could
take and weighting each possible scenario by the probability of its occurrence. QRA achieves this by
modelling each variable within a model by a probability distribution. The structure of a QRA model
is usually (there are some important exceptions) very similar to a deterministic model, with all the
multiplications, additions, etc., that link the variables together, except that each variable is represented
by a probability distribution function instead of a single value. The objective of a QRA is to calculate
the combined impact of the uncertainty1 in the model's parameters in order to determine an uncertainty
distribution of the possible model outcomes.

' I discuss the exact meaning of "uncertainty", randomness, etc., in Chapter 4.

Chapter I Why do a nsk analysis?

5

1.2 The Risk Analysis Process
Figure 1.2 shows a typical flow of activities in a risk analysis, leading from problem formulation to
decision. This section and those that follow provide more detail on each activity.

1.2. I Identifying the risks
Risk identification is the first step in a complete risk analysis, given that the objectives of the decisionmaker have been well defined. There are a number of techniques used to help formalise the identification
of risks. This part of a formal risk analysis will often prove to be the most informative and constructive
element of the whole process, improving company culture by encouraging greater team effort and
reducing blame, and should be executed with care. The organisations participating in a formal risk
analysis should take pains to create an open and blameless environment in which expressions of concern
and doubt can be openly given.
Risk Management approach
Decision-maker

Analyst
Discussion between
analyst and decision-maker

Identify risks, drivers and
risk management options

Define quantitative
questions to help select
between options

Review available data
and possible analysis

t

t
I

Assign
probability
distribut~ons

I
I

Design model

- - - -

1
4Tlme Series 1
Opinion

Run simulation

*

Review results
Finish reporting

Normal
Possible

p
p

Figure 1.2 The risk analysis process.

Incorrect

6

Risk Analysis

Prompt lists

Prompt lists provide a set of categories of risk that are pertinent to the type of project under consideration
or the type of risk being considered by an organisation. The lists are used to help people think about
and identify risks. Sometimes different types of list are used together to improve further the chance of
identifying all of the important risks that may occur. For example, in analysing the risks to some project,
one prompt list might look at various aspects of the project (e.g. legal, commercial, technical, etc.) or
types of task involved in the project (design, construction, testing). A project plan and a work breakdown
structure, with all of the major tasks defined, are natural prompt lists. In analysing the reliability of some
manufacturing plant, a list of different types of failure (mechanical, electrical, electronic, human, etc.)
or a list of the machines or processes involved could be used. One could also cross-check with a plan of
the site or a flow diagram of the manufacturing process. Check lists can be used at the same time: these
are a series of questions one asks as a result of experience of previous problems or opportune events.
A prompt list will never be exhaustive but acts as a focus of attention in the identification of risks.
Whether a risk falls into one category or another is not important, only that the risk is identified. The
following list provides an example of a fairly general project prompt list. There will often be a number
of subsections for each category:
administration;
project acceptance;
commercial;
communication;
environmental;
financial;
knowledge and information;
legal;
management;
partner;
political;
quality;
resources;
strategic;
subcontractor;
technical.
The identified risks can then be stored in a risk register described in Section 1.6.

1.2.2 Modelling the risk problem and making appropriate decisions
This book is concerned with the modelling of identified risks and how to make decisions from those
models. In this book I try not to offer too many modelling rules. Instead, I have focused on techniques
that I hope readers will be able to put together as necessary to produce a good model of their problem.
However, there are a few basic principles that are worth adhering to. Morgan and Henrion (1990) offer
the following excellent "ten commandments" in relation to quantitative risk and policy analysis:

Chapter I Why do a risk analysis?

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

7

Do your homework with literature, experts and users.
Let the problem drive the analysis.
Make the analysis as simple as possible, but no simpler.
Identify all significant assumptions.
Be explicit about decision criteria and policy strategies.
Be explicit about uncertainties.
Perform systematic sensitivity and uncertainty analysis.
Iteratively refine the problem statement and the analysis.
Document clearly and completely.
Expose to peer review.

The responses to correctly identified and evaluated risks are many, but generally fall into the following
categories:
Increase (the project plan may be overly cautious).
Do nothing (because it would cost too much or there is nothing that can be done).
Collect more data (to better understand the risk).
Add a contingency (extra amount to budget, deadline, etc., to allow for possibility of risk).
Reduce (e.g. build in redundancy, take a less risky approach).
Share (e.g. with partner, contractor, providing they can reasonably handle the impact).
Transfer (e.g. insure, back-to-back contract).
Eliminate (e.g. do it another way).
Cancel project.
This list can be helpful in thinking of possible responses to identified risks. It should be borne in mind
that these risk responses might in turn cany secondary risks. Fall-backplans should be developed to deal
with risks that are identified and not eliminated. If done well in advance, they can help the organisation
react efficiently, calmly and in unison in a situation where blame and havoc might normally reign.

1.3 Risk Management Options
The purpose of risk analysis is to help managers better understand the risks (and opportunities) they
face and to evaluate the options available for their control. In general, risk management options can be
divided into several groups.
Acceptance (Do nothing)

Nothing is done to control the risk or one's exposure to that risk. Appropriate for risks where the cost
of control is out of proportion with the risk. It is usually appropriate for low-probability, low-impact
risks and opportunities, of which one normally has a vast list, but you may be missing some high-value
risk mitigation or avoidance options, especially where they control several risks at once. If the chosen
response is acceptance, some considerable thought should be given to risk contingency planning.

8

Risk Analys~s

Increase

You may find that you are already spending considerable resources to manage a risk that is excessive
compared with the level of protection that it affords you. In such cases, it is logical to reduce the level
of protection and allocate the resources to manage other risks, thereby achieving a superior overall risk
efficiency. Examples are:
remove a costly safety regulation for nuclear power plants that affects a risk that would otherwise
still be miniscule;
cease the requirement to test all slaughtered cows for BSE and use saved money for hospital upgrades.
It may be logical but nonetheless politically unacceptable. There are not too many politicians or CEOs
who want to explain to the public that they've just authorised less caution in handling a risk.
Get more information

A risk analysis can describe the level of uncertainty there is about the decision problem (here we use
uncertainty as distinct from inherent randomness). Uncertainty can often be reduced by acquiring more
information (whereas randomness cannot). Thus, a decision-maker can determine that there is too much
uncertainty to make a robust decision and request that more information be collected. Using a risk
analysis model, the risk analyst can advise the least-cost method of collecting extra data that would be
needed to achieve the required level of precision. Value-of-information arguments (see Section 5.4.5)
can be used to assess how much, if any, extra information should be collected.
Avoidance (Elimination)

This involves changing a method of operation, a project plan, an investment strategy, etc., so that the
identified risk is no longer relevant. Avoidance is usually employed for high-probability, high-impact
type risks. Examples are:
use a tried and tested technology instead of the new one that was originally envisaged;
change the country location of a factory to avoid political instability;
scrap the project altogether.
Note that there may be a very real chance of introducing new (and perhaps much more important)
risks by changing your plans.
Reduction (Mitigation)

Reduction involves a range of techniques, which may be used together, to reduce the probability of the
risk, its impact or both. Examples are:
build in redundancy (standby equipment, back-up computer at different location);
perform more quality tests or inspections;
provide better training to personnel;
spread risk over several areas (portfolio effect).
Reduction strategies are used for any level of risk where the remaining risk is not of very high severity
(very high probability and impact) and where the benefits (amount by which risk is reduced) outweigh
the reduction costs.

Chapter I W h y do a r~skanalys~s? 9

Cont~ngencyplann~ng

These are plans devised to optimise the response to risks should they occur. They can be used in
conjunction with acceptance and reduction strategies. A contingency plan should identify individuals
who take responsibility for monitoring the occurrence of the risk, andlor identified risk drivers for
changes in the risk's probability or possible impact. The plan should identify what to do, who should
do it and in which order, the window of opportunity, etc. Examples are:
have a trained firefighting team on site;
have a preprepared press release;
have a visible phone list (or email distribution list) of whom to contact if the risk occurs;
reduce police and emergency service leave during a strike;
fit lifeboats on ships.

Management's response to an identified risk is to add some reserve (buffer) to cover the risk should it
occur. Appropriate for small to medium impact risks. Examples are:
allocate extra funds to a project;
allocate extra time to complete a project;
have cash reserves;
have extra stock in shops for a holiday weekend;
stockpile medical and food supplies.

'4

Insurance

Essentially, this is a risk reduction strategy, but it is so common that it is worth mentioning separately.
If an insurance company has done its numbers correctly, in a competitive market you will pay a little
above the expected cost of the risk (i.e. probability * expected impact should the risk occur). In general,
we therefore insure for risks that have an impact outside our comfort zone (i.e. where we value the risk
higher than its expected value). Alternatively, you may feel that your exposure is higher than the average
policy purchaser, in which case insurance may be under your expected cost and therefore extremely
attractive.
Risk transfer

This involves manipulating the problem so that the risk is transferred from one party to another. A
common method of transferring risk is through contracts, where some form of penalty is included into a
contractor's performance. The idea is appealing and used often but can be very inefficient. Examples are:
penalty clause for running over agreed schedule;
performance guarantee of product;
lease a maintained building from the builder instead of purchasing;
purchase an advertising campaign from some media body or advertising agency with payment
contingent on some agreed measure of success.

I0

Risk Analysis

You can also consider transferring risks to you, where there is some advantage to relieving another
party of a risk. For example, if you can guarantee a second party against some small risk resultant from
an activity you wish to take that provides you with much greater benefit than the other party's risk, the
second party may remove its objection to your proposed activity.

1.4 Evaluating Risk Management Options
The manager evaluating the possible options for dealing with a defined risk issue needs to consider
many things:
Is the risk assessment of sufficient quality to be relied upon?
How sensitive is the ranking of each option to model uncertainties?
What are the benefits relative to the costs associated with each risk management option?
Are there any secondary risks associated with a chosen risk management option?
How practical will it be to execute the risk management option?
Is the risk assessment of sufficient quality to be relied upon? (See Chapter 3.)
How sensitive is the ranking of each option to model uncertainties?
On this last point, we almost always would like to have better data, or greater certainty about the
form of the problem: we would like the distribution of what will happen in the future to be as narrow
as possible. However, a decision-maker cannot wait indefinitely for better data and, from a decisionanalytic point of view, may quickly reach the point where the best option has been determined and no

Figure 1.3

Different possible outputs compared with a threshold T

Chapter I W h y do a risk analys~s? I I

further data (or perhaps only a very dramatic change in knowledge of the problem) will make another
option preferable. This concept is known as decision sensitivity. For example, in Figure 1.3 the decisionmaker considers any output below a threshold T (shown with a dashed line) to be perfectly acceptable
(perhaps this is a regulatory threshold or a budget). The decision-maker would consider option A to
be completely unacceptable and option C to be perfectly fine, and would only need more information
about option B to be sure whether it was acceptable or not, in spite of all three having considerable
uncertainty.

1.5 Inefficiencies in Transferring Risks to Others
A common method of managing risks is to force or persuade another party to accept the risk on your
behalf. For example, an oil company could require that a subcontractor welding a pipeline accept the
costs to the oil company resulting from any delays they incur or any poor workmanship. The welding
company will, in all likelihood, be far smaller than the oil company, so possible penalty payments
would be catastrophic. The weldng company will therefore value the risk as very high and will require
a premium greatly in excess of the expected value of the risk. On the other hand, the oil company may
be able to absorb the risk impact relatively easily, so would not value the risk as highly. The difference
in the utility of these two companies is shown in Figures 1.4 to 1.7, which demonstrate that the oil
company will pay an excessive amount to eliminate the risk.
A far more realistic approach to sharing risks is through a partnership arrangement. A list of risks
that may impact on various parties involved in the project is drawn up, and for each risk one then asks:
How big is the risk?
What are the risk drivers?
Who is in control of the risk drivers? Who has the experience to control them?
Who could absorb the risk impacts?
How can we work together to manage the risks?
Utility gain

I

:

'

Utility loss

Figure 1.4 The contractor's utility function is highly concave over the money gainlloss range in question.
That means, for example, that the contractor would value a loss of 100 units of money (e.g. $100 000) as a
vastly larger loss in absolute utility terms than a gain of $100 000 might be.

I2

1

Risk Analysis

Utility gain

Utility loss

Figure 1.5 Over that same money gain/loss range, the oil company has an almost exactly linear utility
function. The contractor, required to take on a risk with an expected value of -$60000, would value this
as - X utiles. To compensate, the contractor would have to charge an additional amount well in excess of
$100 000. The oil company, on the other hand, would value -$60 000 in rough balance with +$60 000, so
will be paying considerably in excess of its valuation of the risk to transfer it to the contractor.

Utility gain

Utility loss

Figure 1.6 Imagine the risk has a 10 % probability of occurring, and its impact would be -$300 000, to give
an expected value of -$30 000. If $300 000 is the total capital value of the contractor, it won't much matter
to the contractor whether the risk impact is $300 000 or $3 000 000 -they still go bust. This is shown by the
shortened utility curve and the horizontal dashed line for the contractor.

What arrangement would efficiently allocate the risk impacts and rewards for good risk management?
Can we insure, etc., to share risks with outsiders?
The more one can allocate ownership of risks, and opportunities, to those who control them the
better - up to the point where the owner could not reasonably bear the risk impact where others can.
Answering the questions above will help you construct a contractual arrangement that is risk efficient,
workable and tolerable to all parties.

Chapter I W h y do a risk analysis?

13

Utility gain

Utility loss

Figure 1.7 In this situation, the contractor now values any risk with an impact that exceeds its capital value
at a level that is less than the oil company (shown as "Discrepancy"). It may mean that the contractor can
offer a more competitive bid than another, larger contractor who would feel the full risk impact, but the oil
company will not have covered the risk it had hoped to transfer, and so again will be paying more than it
should to offload the risk. Of course, one way to avoid this problem is to require evidence from the contractor
that they have the necessary insurance or capital base to cover the risk they are being asked to absorb.

1.6 Risk Registers
A risk register is a document or database that lists each risk pertaining to a project or organisation,
along with a variety of information that is useful for the management of those risks. The risks listed in
a risk register will have come from some collective exercise to identify risks. The following items are
essential in any risk register entry:
date the register was last modified;
name of risk;
description of what the risk is;
description of why it would occur;
description of factors that would increase or decrease its probability of occurrence or size of impact
(risk drivers);
semi-quantitative estimates of its probability and potential impact;
P- I scores;
name of owner of the risk (the person who will assume responsibility for monitoring the risk and
effecting any risk reduction strategies that have been agreed);
details of risk reduction strategies that it is agreed will be taken (i.e. strategy that will reduce the
impact on the project should the risk event occur andlor the probability of its occurrence);
reduced impact andlor probability of the risk, given the above agreed risk reduction strategies have
been taken;

14

a
a

a
a

Risk Analysis

ranking of risk by scores of the reduced P - I ;
cross-referencing the risk event to identification numbers of tasks in a project plan or areas of
operation or regulation where the risk may impact;
description of secondary risks that may arise as a result of adopting the risk reduction strategies;
action window - the period during which risk reduction strategies must be put in place.
The following items may also be useful to include:

a
a

a
a
a
a

description of other optional risk reduction strategies;
ranking of risks by the possible effectiveness of further risk mitigation [effectiveness = (total
decrease in risk)/(cost of risk mitigation action)];
fall-back plan in the event the risk event still occurs;
name of the person who first identified the risk;
date the risk was first identified;
date the risk was removed from the list of active risks (if appropriate).

A risk register should include a description of the scale used in the semi-quantitative analysis, as
explained in the section on P-I scores. A risk register should also have a summary that lists the top
risks (ten is a fairly usual number but will vary according to the project or overview level). The "top"
risks are those that have the highest combination of probability and impact (i.e. severity), after the reducing effects of any agreed risk reduction strategies have been included. Risk registers lend themselves
perfectly to being stored in a networked database. In this way, risks from each project or regulatory
body's concerns, for example, can be added to a common database. Then, a project manager can access
that database to look at all risks to his or her project. The finance director, lawyer, etc., can look at all
the risks from any project being managed by their departments and the chief executive can look at the
major risks to the organisation as a whole. What is more, head office has an easy means for assessing
the threat posed by a risk that may impact on several projects or areas at the same time. "Dashboard
software can bring the outputs of a risk register into appropriate focus for the decision-makers.

1.6.1 P-l tables
The risk identification stage attempts to identify all risks threatening the achievement of the project's or
organisation's goals. It is clearly important, however, that attention is focused on those risks that pose
the greatest threat.
Defining qualitative risk descriptions

A qualitative assessment of the probability P of a risk event (a possible event that would produce a
negative impact on the project or organisation) and the impact(s) it would produce, I , can be made by
assigning descriptions to the magnitudes of these probabilities and impacts. The assessor is asked to
describe the probability and impact of each risk, selecting from a predetermined set of phrases such as:
nil, very low, low, medium, high and very high. A range of values is assigned to each phrase in order
to maintain consistency between the estimates of each risk. An example of the value range that might
be given to each phrase in a risk register for a particular project is shown in Table 1.1.
Note that in Table 1.1 the value ranges are not evenly spaced. Ideally there is a multiple difference
between each range (in this case roughly 3). If the same multiple is applied for probability and impact

i
I
a

Chapter I Why do a r ~ s analys~s?
k
I5

Table 1.1 An example of the value ranges that could be associated with qualitative descriptions of the
probabilities and impacts of a risk on a project.
Category

Probability (%)

Very high
High
Medium
Low
Very low

10-50
5-10
2-5
1-2
<1

Delay (days)

Cost ($k)

Quality
-

>I00
30-100
10-30
2-10
<2

> I 000
30-100
100-300
20-1 00
t20

--

Failure to meet acceptance criteria
Failure to meet > 1 important specification
Failure to meet an important specification
Failure to meet > 1 minor specification
Failure to meet a minor specification

Table 1.2 An example of the descriptions that could be associated with impacts
of a risk on a corporation.
Category

Description

Catastrophic
Major
Moderate
Minor
Insignificant

Jeopardises the existence of the company
No longer possible to achieve business objectives
Reduced ability to achieve business objectives
Some business disruptions but little effect on business objectives
No impact on business strategy objectives

scales, we can more easily determine severity scores as described below. The value ranges can be
selected to match the size of the project. Alternatively, they can be matched to the effect the risks would
have on the organisation as a whole. The drawback in making the definition of each phrase specific to
a project is that it becomes very difficult to perform a combined analysis of the risks from all projects
in which the organisation is involved. From a corporate perspective one can describe how a risk affects
the health of a company, as shown in Table 1.2.
Visualising a portfolio of risks

A P-I table offers a quick way to visualise the relative importance of all identified risks that pertain
to a project (or organisation). Table 1.3 illustrates an example. All risks are plotted on the one table,
allowing easy identification of the most threatening risks as well as providing a general picture of the
overall riskiness of the project. Risk numbers 13, 2, 12 and 15 are the most threatening in this example.
The impact of a project risk that is most commonly considered is a delay in the scheduled completion
of the project. However, an analysis may also consider the increased cost of the project resulting from
Table 1.3 Example of a P - 1 table for schedule
delav.

16

Risk Analysis

Table 1.4 P-1 table for a s~ecificrisk.

I

Probabilitv

I

each risk. It might further consider other, less numerically definable impacts on the project, for example:
the quality of the final product; the goodwill that could be lost; sociological impacts; political damage;
or strategic importance of the project to the organisation. A P-I table can be constructed for each type
of impact, enabling the decision-maker to gain a more rounded understanding of a project's riskiness.
P - I tables can be constructed for the various types of impact of each single risk. Table 1.4 illustrates
an example where the impacts of schedule delay, T, cost, $, and product quality, Q , are shown for a
specific risk. The probability of each impact may not be the same. In this example, the probability of the
risk event occurring is high, and hence the probability of schedule delay and cost impacts are high, but
it is considered that, even if this risk event does occur, the probability of a quality impact is still low. In
other words, there is a fairly small probability of a quality impact even when the risk event does occur.
Ranking risks

P-I scores can be used to rank the identified risks. A scaling factor, or weighting, is assigned to each
phrase used to describe each type of impact. Table 1.5 provides an example of the type of scaling factors
that could be associated with each phraselimpact type combination.
In this type of scoring system, the higher the score, the greater is the risk. A base measure of risk
is probability *impact. The categorising system in Table 1.1 is on a log scale, so, to make Table 1.5
consistent, we can define the severity of a risk with a single type of impact as

which leaves the severity on a log scale too. If a risk has k possible types of impact (quality, delay,
cost, reputation, environmental, etc.), perhaps with different probabilities for each impact type, we can
Table 1.5 An example of
the scores that could be
associated with descriptive
risk categories to produce
a severity score.
Category

Score

Very high
High
Medium
Low
Very low

5
4
3
2
1

Chapter I W h y do a risk analysis?

17

High severity
Medium severity
Low severity

still combine them into one score as follows:

The severity scores are then used to determine the most important risks, enabling the management to
focus resources on reducing or eliminating risks from the project in a rational and efficient manner.
A drawback to this approach of ranking risks is that the process is quite dependent on the granularity
of the scaling factors that are assigned to each phrase describing the risk impacts. If we have better
information on probability or impact than the scoring system would allow, we can assign a more accurate
(non-integer) score.
In the scoring regime of Table 1.5, for example, a high severe risk could be defined as having a
score higher than 7, and a low risk as having a score lower than 5. Given the crude scaling used, risks
with a severity of 7 may require further investigation to determine whether they should be categorised
as high severity. Table 1.6 shows how this segregates the risks shown in a P-I table into the three
regions.
P-I scores for a project provide a consistent measure of risk that can be used to define metrics and
perform trend analyses. For example, the distribution of severity scores for a project gives an indication
of the overall "amount" of risk exposure. More complex metrics can be derived using severity scores,
allowing risk exposure to be normalised and compared with a baseline status. These permit trends in risk
exposure to be identified and monitored, giving valuable information to those responsible for controlling
the project.
Efficient risk management with severity scores

Efficient risk management seeks to achieve the maximum reduction in risk for a given amount of
investment (of people, time, money, restriction of liberty, etc.). Thus, we need to evaluate in some
sense the ratio (reduction in risk)/(investment to achieve reduction). If you use the log scale for severity
described here, this would equate to calculating

18

Risk Analysis

The risk management options that provide the greatest efficiency should logically be preferred, all
else being equal.
Inherent risks are the risk estimates before accounting for any mitigation efforts. They can be plotted
against a guiding risk response framework where the P - I table is split, covered by overlapping areas
of avoid, control, transfer and accept, as shown in Figure 1.8:
"Avoid" applies where an organisation would be accepting a high-probability, high-impact risk
without any compensating benefits.
"Control" applies usually to high-probability, low-impact risks, normally associated with repetitive
actions, and therefore usually managed through better internal processes.
"Transfer" applies to low-probability, high-impact risks usually managed through insurance or other
means of transferring the risk to parties better capable of absorbing the impact.
"Accept" applies to the remaining low-probability, low-impact risks for which it may not be effective
to focus on too much.
Figure 1.9 plots residual risks after any implemented risk mitigation strategies and tracks the progress
in managing the residual risks compared with the previous year using arrows. Grey letters represent the
status of the risk last year if it is different. A dashed arrow pointing out of the graph means that the risk
5

4

-

HIGH-LEVEL RISKS

3-

b
2

2
2

-

1-

0

1

0

1

2

3
PROBABILITY

Figure 1.8 P-1 graph for inherent risks.

4

5

Chapter I Why do a nsk analys~s? 19

Figure 1.9 P-1 graph for residual risks.

has been avoided. An enhancement to the residual risk graph that you might like to add is to plot each
risk as a circle whose radius reflects how comfortable you are in dealing with the residual risk - for
example, perhaps you have handled the occurrence of similar risks before and minimised their impact
through good management, or perhaps they got out of hand. A small circle represents risks that one is
comfortable managing, and a large circle represents the opposite, so the less manageable risks stand out
in the plot.

Chapter 2

Planning a risk analysis
In order to plan a risk analysis properly, you'll need to answer a few questions:
What do you want to know and why?
What assumptions are acceptable?
What is the timing?
Who is going to do the risk analysis?
1'11 go through each of these in turn.

2.1 Questions and Motives
The purpose of a risk analysis is to provide information to help make better decisions in an uncertain
world. A decision-maker has to work with the risk analyst precisely to define the questions that need
answering. You should consider a number of things:

1. Rank the questions that need answering from "critical" down to "interesting". Often a single model
cannot answer all questions, or has to be built in a complicated way to answer several questions,
so a common recognition of the extra effort needed to answer each question going down the list
helps determine a cut-off point.
2. Discuss with the risk analyst the form of the answer. For example, if you want to know how much
extra revenue might be made by buying rather than leasing a vessel, you'll need to specify a
currency, whether this should be as a percentage or in actual currency and whether you want just
the mean (which can make the modelling a lot easier) or a graph of the distribution. Explain what
statistics you need and to what accuracy (e.g. asking for the 95th percentile to the nearest $1000),
as this will help the risk analyst save time or figure out that an unusual approach might be needed
to get the required accuracy.
3. Explain what arguments will be based on these outputs. I am of the view that this is a key breakdown
area because a decision-maker might ask for specific outputs and then put them together into an
argument that is probabilistically incorrect. Much embarrassment and frustration all round. It is
better to explain the arguments (e.g. comparing with the distribution of another potential project's
extra revenue) that would be put forward and find out if the risk analyst agrees that this is technically
correct before you get started.
4. Explain whether the risk analysis has to sit within a framework. This could be a formal framework,
like a regulatory requirement or a company policy, or it could be informal, like building up a
portfolio of risk analyses that can be compared on the same footing (for example, we are helping a

22

5.

6.

7.

8.

9.

R~skAnalysis

large chemical manufacturer to build up a combined toxicological, environmental, etc., risk analysis
database for their treasure chest of compounds). It will help the risk analyst ensure the maximum
level of compatibility - e.g. that the same base assumptions are used between risk analyses.
Explain the target audience. We write reports on all our risk analyses, of course, but sometimes
there can be several versions: the executive summary; the main report; and the technical report
with all the formulae and guides for testing. Often, others will want to run the model and change
parameters, so we make a model version that minirnises the ability to mess up the mathematics,
and write the code to allow the most flexibility. These days we usually put a VBA user interface on
the front to make life easier and perhaps add a reporting facility to compare results. We might add
a help file too. Clients will also sometimes ask us to prepare a Powerpoint presentation. Knowing
the knowledge level and focus of each target audience, and knowing what types of reporting will
be needed at the offset, saves a lot of time.
Discuss any possible hostile reactions. The results of a risk analysis will not always be popular,
and when people dislike the answers they start attacking the model (or, if you're unlucky, the
modeller). Assumptions are the primary Achilles' heel, as we can argue forever about whether
assumptions are right. I talk about getting buy-in for assumptions in Section 5.2. Statistical analysis
of data is also rather draining - it usually involves a couple of very technical people with opposing
arguments about the appropriateness of a statistical procedure that nobody else understands. The
decision to include and exclude certain datasets can also create a lot of tension. The arguments can
be minimised, or at least convincingly dismissed, if people likely to be hostile are brought into the
analysis process early, or an external expert is asked to give an independent review.
Figure out a timeline. Decision-makers have something of a habit of setting unrealistic deadlines.
When these deadlines pass, nothing very dramatic usually happens, as the deadlines are some
artificial internal confection. Our consultants deal with deadlines all the time, of course, but we
openly discuss whether a deadline is really that important because, if we have to meet a tight
deadline (and that happens), the quality of the risk analysis may be lower than would have been
achievable with more time. The decision-maker has to be honest about time limits and decide
whether it is worth postponing things for a bit.
Figure out the priority level. The risk analyst might have other work to juggle too. The project
might be of high importance and justify pulling off other resources to help with the analysis or
instructing others in the organisation to set aside time to provide good quality input.
Decide on how regularly the decision-maker and risk analyst will meet. Things change and the risk
analysis may have to be modified, so find that out sooner rather than later.

2.2 Determine the Assumptions that are Acceptable or Required
If a risk analysis is to sit within a certain framework, discussed above, it may well have to comply
with a set of common assumptions to allow meaningful comparisons between the results of different
analyses. Sometimes it is better not to revise some assumptions for a new analysis because it makes it
impossible to compare. You can often see a similar problem with historic data, e.g. calculating crime or
unemployment statistics. It seems that the basis for these statistics keeps changing, making it impossible
to know whether the problem is getting better or worse.
In a corporate environment there will be certain base assumptions used for things like interest and
exchange rates, production capacity and energy price. The same assumptions should be used in all

Chapter 2 Planning a risk analysis

23

models. In a risk analysis world these should be probabilistic forecasts, but they are nonetheless often
fixed-point values. Oil companies, for example, have the challenging job of figuring out what the oil
price might be in the future. They can get it very wrong so often take a low price for planning purposes,
e.g. $16 a barrel, which in 2007 might seem rather unlikely for the future. The risk analyst working
hard on getting everything else really precise could find such an assumption irritating, but it allows
consistency between analyses where oil price forecast uncertainty could be so large as to mask the
differences between investment opportunities.
Some assumptions we make are conservative, meaning that, if, for example, we need a certain percentile of the output to be above X before we accept the risk as acceptable, then a conservative
assumption will bias the output to lower values. Thus, if the output still gives numbers that say the risk
is acceptable, we know we are on pretty safe ground. Conservative assumptions are most useful as a
sensitivity tool to demonstrate that one has not taken an unacceptable risk, but they are to be avoided
whenever possible because they run counter to the principle of risk analysis which is to give an unbiased
report of uncertainty.

2.3 Time and Timing
We get a lot of requests to help "risk" a model. The potential client has spent a few months working
on a problem, building up a cashflow model, etc., and the decision-makers decide the week before the
board meeting that they really should have a risk analysis done.
If done properly, risk analysis is an integral part of the planning of a project, not an add-on at the
end. One of the prime reasons for doing risk analyses is to identify risks and risk management strategies
so the decision-makers can decide how the risks can be managed, which could well involve a revision
of the project plan. That can save a lot of time and money on a project. If risk analysis is added on at
the end, you lose all that potential benefit.
The data collection efforts required to produce a fixed-value model of a project are little different
from the efforts required for a risk analysis, so adding a risk analysis on at the end is inefficient and
delays a project, as the risk analyst has to go back over previous work.
We advocate that a risk analyst write the report as the model develops. This helps keep a track of
what one is doing and makes it easier to meet the report submission deadline at the end. I also like to
write down my thinking because it helps me spot any mistakes early.
Finally, try to allow the risk analyst enough time to check the model for errors and get it reviewed.
Chapter 16 offers some advice on model validation.

2.4 You'll Need a Good Risk Analyst or Team
If the risk analysis is a one-off and the outcome is important to you, I recommend you hire in a
consultant risk analyst. Well I would say that, of course, but it does make a lot of sense. Consultants
are expensive on a daily basis, but, certainly at Vose Consulting, we are far faster (my guess is over
10 times faster than a novice) - we know what we're doing and we know how to communicate and
organise effectively. Please don't get a bright person within your organisation, install some risk analysis
software on their computer and tell them to get on with the job. It will end in tears.
The publishers of risk analysis software (Crystal Ball, @RISK, Analytica, Risk+, PERTmaster, etc.)
have made risk analysis modelling very easy to implement from a software viewpoint. The courses they

24

Risk Analysis

teach show you how to drive the software and reinforce the notion that risk analysis modelling is pretty
easy (Vose Consulting courses generally assume you have already attended a software familiarisation
course). In a lot of cases, risk analysis is in fact pretty easy, as long as you avoid some common
basic errors discussed in Section 7.4. However, it can also become quite tricky too, for sometimes
subtle reasons, and you should have someone who understands risk analysis well enough to be able to
recognise and handle the trickier models. Knowing how to use Excel won't make you an accountant
(but it's a good first step), and learning how to use risk analysis software won't make you a risk analyst
(but it's also a good first step).
There are still very few tertiary courses in risk analysis, and these courses tend to be highly focused in
particular areas (financial modelling, environmental risk assessment, etc.). I don't know of any tertiary
courses that aim to produce professional risk analysts who can work across many disciplines. There
are very few people who could say they are qualified to be a risk analyst. This makes it pretty tough
to know where to search and to be sure you have found someone who will have the knowledge to
analyse your risks properly. It seems that industry-specific risk analysts also have little awareness of the
narrowness of their knowledge: a little while ago we advertised for two highly qualified actuarial and
financial risk analysts with several years experience and received a large number of applications from
people who were risk analysts in toxicology, microbial, environmental and project areas with almost no
overlap in required skill sets.

2.4.1 Qualities of a risk analyst
I often get asked by companies and government agencies what sort of person they should look for to
fill a position as a risk analyst. In my view, candidates should have the following characteristics:
Creative thinkers. Risk analysis is about problem-solving. This is at the top of my list and is the
rarest quality.
Conjident. We often have to come up with original solutions. I've seen too many pieces of work
that have followed some previously published method because it is "safer". We also have to present
to senior decision-makers and maybe defend our work in front of hostile stakeholders or a court.
Modest. Too many risk analyses fail to meet their requirements because of a risk analyst who thought
shehe could do it without help or consultation.
Thick-skinned. Risk analysts bring together a lot of disparate information and ideas, sometimes
conflicting, sometimes controversial, and we produce outputs that are not always what people want
to see, so we have to be prepared for a fair amount of enthusiastic criticism.
Communicators. We have to listen to a lot of people and present ideas that are new and sometimes
difficult to understand.
Pragmatic. Our models could always be better with more time, data and resources, but decisionmakers have deadlines.
Able to conceptualise. There are a lot of tools at our disposal that are developed in various fields of
risk, so the risk analyst needs to read widely and be able to extrapolate an idea from one application
to another.
Curious. Risk analysts need to keep learning.
Good at mathematics. Take a look at Part 2 of this book to get a feel for the level. It will depend
on the area: project risk requires more intuition and perseverance but less mathematics, insurance

Chapter 2 Planning a risk analysis

25

and finance require intuition and high mathematical skills, food safety requires medium levels of
everything.
A feel for numbers. It is one thing to be good at mathematics, but we also have to have an idea
of where the numbers should lie because it (a) helps us check the work and (b) allows us to know
where we can take shortcuts.
Finishers. Some people are great at coming up with ideas, but lose interest when it comes to
implementing them. Risk analysts have to get the job done.
a Cynical. We have to maintain a healthy cynicism about published work and about how good our
subject matter experts are.
Pedantic. When developing probability models, one needs to be very precise about exactly what
each variable represents.
Careful. It is easy to make mistakes.
Social. We have to work in teams.
a Neutral. Our job is to produce an objective risk analysis. A project manager is not usually ideal
to perform the project risk analysis because it may reflect on hisher ability to manage and plan. A
scientist is not ideal if shehe has a pet theory that could slant the approach taken.
It's a demanding list and indicates, I think, that risk analysis should be performed by people of high
skill levels who are fairly senior and in a respected position within a company or agency. It is also
rather unlikely that you will find all these qualities in the one person: the best risk analysis units with
which we work are composed of a number of individuals with complementary skills and strengths.

2.4.2 Suitable education
I interviewed a statistics student a couple of months back. This person was just finishing a PhD and had
top grades throughout from a very reputable school. I asked a pretty simple question about estimating a
prevalence and got a vague answer about how this person would perform the appropriate test and report
the confidence interval, but the student couldn't tell me what that test might be (this is a really basic
Statistics 101-type question). I offered some numbers and asked what the bounds might roughly be, but
the interviewee had absolutely no idea. With each question it became very clear that this person had
been taught a lot of theory but had no feel for how to use it, and no sense of numbers. We didn't hire.
I interviewed another person who had written a very sophisticated traffic model using discrete event
simulation (which we use a fair bit) that was helping decide how to manage boat traffic. The model
predicted that putting in traffic lights on the narrow part of some waterway would produce a horrendous
number of crashes at the traffic light queues, easily outweighing the crashes avoided by letting vessels
pass each other in the narrow part of the waterway. Conclusion: no traffic lights. That seemed strange
to me and, after some thought, the interviewee explained it was probably because the model used a
probability of crashing that was inversely proportional to the distance between the vessels, and vessels
in a queue are very close, so the model generated lots of crashes. But they are also barely moving, I
pointed out, so the probability of a collision will be lower at a given distance for vessels at the lights
than for vessels passing each other at speed, and any contact between waiting vessels would have a
negligible effect. The modeller responded that the probability could be changed. We didn't hire that
person either because the modeller had never stepped back and asked "does this make sense?'.
I interviewed a student who was just finishing a Masters degree and was writing up a thesis on
applying probability models from physics to financial markets. This person explained that studying had

26

Risk Analysis

become rather dull because it was always about learning what others had done, but the thesis was a
different story because there was a chance to think for oneself and come up with something new. The
student was very enthusiastic, had great mathematics and could really explain to me what the thesis was
about. We hired and I have no regrets.
A prospective hire for a risk analysis position will need some sort of quantitative background. I
think the best candidates tend to have a background that combines attempting to model the real world
with using the results to make decisions. In these areas, approximations and the tools of approximation
are embraced as necessary and useful, and there is a clear purpose to modelling that goes beyond the
academic exercise of producing the model itself. Applied physics, engineering, applied statistics and
operations research are all very suitable. Applied physics is the most appealing of all of them (I may be
biased, I studied physics as an undergraduate) because in physics we hypothesise how the world might
work, describe the theory with mathematics, make predictions and figure out an experiment that will
challenge the theory, perform the experiment, collect and analyse data and conclude whether our theory
was supported. Learning this basic thinking is extraordinarily valuable: risk analysis follows much of
the same process, uses many of the same modelling and statistical techniques, makes approximations
and should critically review scientific data when relevant. Most published papers describe studies that
were designed to show supportive evidence for someone's theory.
Pure mathematics and classical statistics are not that great: pure mathematics is too abstract; we find
that pure statistics teaching is very constrained, and encourages formulaic thinking and reaching for a
computer rather than a pen and paper. The schools also don't seem to emphasise communication skills
very much. It's a shame because the statistician has so much of the basic knowledge requirements.
Bayesian statistics is somewhat better - it does not have such a problem with subjective estimates, its
techniques are more conducive to risk analysis and it's a newer field, so the teaching is somewhat less
staid. Don't be swayed by a six-sigma black belt qualification - the ideas behind Six Sigma certainly
have merit, but the technical knowledge gained to get a black belt is quite basic and the production-line
teaching seems to be at the expense of in-depth understanding and creativity. The main things you will
need to look out for are a track record of independent thinking, strong communication skills and some
reasonable grasp of probability modelling. The more advanced techniques can be learned from courses
and books.

2.4.3 Our team
I thought it might be helpful to give you a brief description of how we organise our teams. If your
organisation is large enough to need 10 or more people in a risk analysis team, you might get some
ideas from how we operate.
Vose Consulting has quite a mixture of people, roughly split into three groups, and we seem to have
hired organically to match people's skills and characters to the roles of these groups. I love to learn,
teach, develop new talent and dream up new ideas, so my team is made up of conceptual thinkers with
great mathematics, computing and researching skills. They are young and very intelligent, but are too
young for us to put them into the most stressful jobs, so part of my role is to give them challenging
work and the confidence to meet consulting deadlines by solving their problems with them. My office
is the nursery for Huybert's team to which they can migrate once they have more experience. Huybert
is an ironman triathlon competitor with boundless energy. His consulting group fly around everywhere
solving problems, writing reports and meeting deadlines. They are real finishers and my team provide
as much technical support as they need (though they are no slouches, we have four quantitative PhDs
and nobody with less than a Masters degree in that team). Timour is a very methodical, deep thinker.

Chapter 2 Plann~nga r~skanalysis

27

Unlike me, he tends not to say anything unless he has something to say. His programming group writes
our commercial software like ModelRisk, requiring a long-term development view, but he has a couple
of people who write bespoke software for our clients meeting strict deadlines too.
When we get a consulting enquiry, the partners will discuss whether we have the time and knowledge
to do the job, who it would involve and who would lead it. Then the prospective lead is invited to
talk with us and the client about the project and then takes over. The lead consultant has to agree to
do the project, hisfher name and contact details are put on the MOU and helshe remains in charge and
responsible to the client throughout the project. A partner will monitor progress, or a partner could be
the lead consultant. The lead consultant can ask anyone within the company for advice, for manpower
assistance, to review models and reports, to write bespoke software for the client, to be available for a
call with the client, etc. I like this approach because it means we spread around the satisfaction of a job
well done, it encourages responsibility and creativity, it emphasises a flat company structure and we all
get to know what others in the company can do, and because the poor performance in a project would
be the company's failure, not one individual's.
I read Ricardo Semler's book Maverick a few months ago and loved it for showing me that much of
what we practise in our small company can work in a company as large as Semco. Semco also works
in groups that mix around depending on the project and has a flat hierarchy. We give our staff a lot of
responsibility, so we also assume that they are responsible: we give them considerable freedom over
their working hours and practices, we expect them to keep expenses at a sensible level, but don't set
daily rates, etc. Staff choose their own computers, can buy a printer, etc., without having to get approval.
The only thing we have no flexibility on is honesty.

Chapter 3

The quality of a risk analysis
We've seen a fair number of quantitative risk analyses that are terrible. They might also have been very
expensive, taken a long time to complete and used up valuable human resources. In fact, I'll stick my
neck out and say the more complex and expensive a quantitative risk analysis is, the more likely it is
to be terrible. Worst of all, the people making decisions on the results of these analyses have little if
any idea of how bad they are. These are rather attention-grabbing sentences, but this chapter is small
and I would really like you not to skip over it: it could save you a lot of heartache.
In our company we do a lot of reviews of models for decision-makers. We'd love to be able to
say "it's great, trust the results" a lot more often than we do, and I want to spend this short chapter
explaining what, in our experience, goes wrong and what you can do about it. First of all, to give some
motivation for this chapter, I want to show you some of the results of a survey we ran a couple of years
ago in a well-developed science-based area of risk analysis (Figure 3.1). The question appears in the
title of each pane. Which results do you find most worrying?

3.1 The Reasons Why a Risk Analysis can be Terrible

t

1

From Figure 3.1 I think you'll see that there really needs to be more communication between decisionmakers and their risk analysts and a greater attempt to work as a team. I see the risk analyst as an
important avenue of communication between those "on the ground" who understand the problem at
hand and hold the data and those who make decisions. The risk analyst needs to understand the context
of the decision question and have the flexibility to be able to find the method of analysis that gives
the most useful information. I've heard too many risk analysts complain that they get told to produce
a quantitative model by the boss, but have to make the numbers up because the data aren't there. Now
doesn't that seem silly? I'm sure the decision-maker would be none too happy to know the numbers
are all made up, but the risk analyst is often not given access to the decision-makers to let them know.
On the other hand, in some business and regulatory environments they are trying to follow a rule that
says a quantitative risk analysis needs to be completed - the box needs ticking.
Regulations and guidelines can be a real impediment to creative thinking. I've been in plenty of committees gathered to write risk analysis guidelines, and I've done my best to reverse the tendency to be
formulaic. My argument is that in 19 years we have never done the same risk analysis twice: every one
has its individual peculiarities. Yet the tendency seems to be the reverse: I trained over a hundred consultants in one of the big four management consultancy firms in business risk modelling techniques, and they
decided that, to ensure that they could maintain consistency, they would keep it simple and essentially fill
in a template of three-point estimates with some correlation. I can see their point - if every risk analyst
developed a fancy and highly individual model it would be impossible to ensure any quality standard.
The problem is, of course, that the standard they will maintain is very low. Risk analysis should not be a
packaged commodity but a voyage of reasoned thinking leading to the best possible decision at the time.

30

Risk Analysis

What factors leopardise the value of an assessment?
45%
40%
35%
30%
E Usually

2596

QJ 50:50

20%

Seldom

15%

Never
10%
5%
0%
lnsufflcienthuman
resourcesto complete
the assessment

lnsufficlenttlme to
complete the
assessment

lnsufflclentdata to
support the rlsk
assessment

lnsufflclent In-house
expertise in the area

lnsufficlentgeneral
sclentlfic knowledge of
the area

Figure 3.1 Some results of a survey of 39 professional risk analysts working in a scientific field where risk
analysis is well developed and applied very frequently.

Chapter 3 The quality of a risk analysis

3I

I think it is usually pretty easy to see early on in the risk analysis process that a quantitative risk
analysis will be of little value. There are several key areas where it can fall down:
It can't answer all the key questions.
2. There are going to be a lot of assumptions.
3. There is going to be one or more show-stopping assumption.
4. There aren't enough good data or experts.
1.

We can get around 1 sometimes by doing different risk analyses for different questions, but that can
be problematic when each risk analysis has a different set of fundamental assumptions - how do we
compare their results?
For 2 we need to have some way of expressing whether a lot of little assumptions compound to make
a very vulnerable analysis: if you have 20 assumptions (and 20 is quite a small number), all pretty good
ones - e.g. we think there's a 90 % chance they are correct, but the analysis is only useful if all the
assumptions are correct, then we only have a 0.9~' = 12 % chance that the assumption set is correct.
Of course, if this were the real problem we wouldn't bother writing models. In reality, in the business
world particularly, we deal with assumptions that are good enough because the answers we get are
close enough. In some more scientific areas, like human health, we have to deal with assumptions such
as: compound X is present; compound X is toxic; people are exposed to compound X; the exposure is
sufficient to cause harm; and treatment is ineffective. The sequence then produces the theoretical human
harm we might want to protect against, but if any one of those assumptions is wrong there is no human
health threat to worry about.
If 3 occurs we have a pretty good indication that we don't know enough to produce a decent risk analysis model, but maybe we can produce two or three crude models under different possible assumptions
and see whether we come to the same conclusion anyway.
Area 4 is the least predictable because the risk analyst doing a preliminary scoping can be reassured
that the relevant data are available, but then finds out they are not available either because the data turn
out to be clearly wrong (we see this a lot), the data aren't what was thought, there is a delay past the
deadline in the data becoming available or the data are dirty and need so much rework that it becomes
impractical to analyse them within the decision timeframe.
There is a lot of emphasis placed on transparency in a risk analysis, which usually manifests itself
in a large report describing the model, all the data and sources, the assumptions, etc., and then finishes
with some of the graphical and numerical outputs described in Chapter 5. I've seen reports of 100 or
200 pages that seem far from transparent to me - who really has the time or inclination to read such a
document? The executive summary tends to focus on the decision question and numerical results, and
places little emphasis on the robustness of the study.

3.2 Communicating the Quality of Data Used in a Risk Analysis
Elsewhere in this book you will find lots of techniques for describing the numerical accuracy that a
model can provide given the data that are available. These analyses are at the heart of a quantitative
risk analysis and give us distributions, percentiles, sensitivity plots, etc.
In this section I want to discuss how we can communicate any impact on the robustness of a model
owing to the assumptions behind using data or settling on a model scope and structure. Elsewhere in this
book I encourage the risk analyst to write down each assumption that is made in developing equations

32

Risk Analysis

Table 3.1 Pedigree matrix for parameter strength (adapted from Boone et a/., 2007).

Exact measure of the
d

e
s
~
~
representative)

Good fit or measure

Large sample, direct
measurements,
~
~
~
lrecent
~ data,
g
controlledexperiments

Best available practice in wellestablished discipline
(accredited met,,od for
sampling / diagnostic test)

Compared with independent
measurements of the same
variable over long domain,
rigorous correction of errors

Small sample, direct
measurements, less recent Reliable and common method.
data, uncontrolled experiments, Best practice in immature
discipline
low non-response rate

Well correlated but not
the same thing

Several expert estimates in
general agreement

Weak correlation (very
large geographical
differences)

One expert opinion,
rule-of-thumb estimate

Not clearly connected

Crude speculation

Acceptable method but limited
consensus on reliability

with
unknown reliability

No discernible rigour

1'

Weak very indirect

No validation

and performing statistical analyses. We get participants to do the same in the training courses we teach
as they solve simple class exercises, and there is a general surprise at how many assumptions are implicit
in even the simplest type of equation. It becomes rather onerous to write all these assumptions down,
but it is even more difficult to convert the conceptual assumptions underpinning our probability models
into something that a reader rather less familiar with probability modelling might understand.
The NUSAP (Numeral Unit Spread Assessment Pedigree) method (Funtowicz and Ravetz, 1990) is
a notational system that communicates the level of uncertainty for data in scientific analysis used for
policy making. The idea is to use a number of experts in the field to score independently the data under
different categories. The system is well established as being useful in toxicological risk assessment. I
will describe here a generalisation of the idea. It's key attractions are that it is easy to implement and
can be sumrnarised into consistent pictorial representations. In Table 3.1 I have used the categorisation
descriptions of data from van der Sluijs, Risbey and Ravetz (2005), which are: proxy - reflecting
how close data being used are to ideal; empirical - reflecting the quantity and quality of the data;
method - reflecting where the method used to collect the data lies between careful and well established
and haphazard; and validation - reflecting whether the acquired data have been matched to real-world
experience (e.g. does an effect observed in a laboratory actually occur in the wider world).
Each dataset is scored in turn by each expert. The average of all scores is calculated and then divided
by the maximum attainable score of 4. For example:
Expert A
Expert B
Expert C

Proxy
3
3
2

Empirical
2
2
1

Method
4
4
3

Validation
3
3
4

gives an average score of 2.833. Dividing by the maximum score of 4 gives 0.708. An additional level
of sophistication is to allow the experts to weight their level of expertise for the particular variable in

Chapter 3 The quality of a risk analysis

Exposure

Release

Treatment effectiveness

Toxicity

..

1
0.9 -

U)

0.6 0.5 0.4 0.3 0.2

-

0.1 07
0

Figure 3.2

4

' . ..

0.7 -

!f

A

:+

0.8 A

.+ .
..
+

: +
'

+

4

10

+

6

+ 'j

5

+

A a

* .

- +

+

7

: +

.+.

A

+

4

+:

15

33

++

20
25
30
Parameter identification #

+
35

40

45

Plot of average scores for datasets in a toxicological risk assessment.

question (e.g. 0.3 for low, 0.6 for medium and 1.0 for high, as well as allowing experts to select not
to make any comment when it is outside their competence), in which case one calculates a weighted
average score. One can then plot these scores together and segregate them by different parts of the
analysis if desired, which gives an overview of the robustness of data used in the analysis (Figure 3.2).
Scores can be generally categorised as follows:
t0.2
0.2-0.4
>0.4-0.8
>0.8

weak
moderate
high
excellent

So, for example, Figure 3.2 shows that the toxicity part of the analysis appears to be the weakest, with
several datasets in the weak category.
We can summarise the scores for each dataset using a kite diagram to give a visual "traffic light",
green indicating that the parameter support is excellent, red indicating that it is weak and one or two
levels of orange representing gradations between these extremes. Figure 3.3 gives an example: one
works from the centre-point, marking on the axes the weighted fraction of all the experts considering
the parameter support to be "excellent", then adds the weighted fraction considering the support to be
"high", etc. These points are then joined to make the different colour zones - from green in the centre
for "excellent", through yellow and orange, to red in the last category: a kite will be green if all experts
agree the parameter support is excellent and red for weak. Plotting these kite diagrams together can
give a strong visual representation: a sea of green should give great confidence, a sea of red says the
risk analysis is extremely weak. In practice, we'll end up with a big mix of colours, but over time one
can get a sense of what colour mix is typical, when an analysis is comparatively weak or strong and
when it can be relied upon for your field.
The only real impediment to using the system above is that you need to develop a database software
tool. Some organisations have developed their own in-house products that are effective but somewhat
limited in their ability for reviewing, sorting and tracking. Our software developers have it on their "to
do" list to make a tool that can be used across an organisation, where one can track the current status

34

Risk Analysis

Proxy

Method

Figure 3.3 A kite diagram summarising the level of data support the experts believe that a model parameter
will have: red (dark) in the outer band = weak; green (light) in the inner band = excellent.

of a risk analysis, drill down to see the reasons for the vulnerability of a parameter, etc., so you might
like to visit www.vosesoftware.com and see if we've got anywhere yet.

3.3 Level of Criticality
The categorisation system of Section 3.2 helps determine whether a parameter is well supported, but
it can still misrepresent the robustness of the risk analysis. For example, we might have done a food
safety microbial risk analysis involving 10 parameters - nine enjoy high or excellent support, and one
is suffering weak support. If that weakly supported parameter is defining the dose-response relationship
(the probability a random individual will experience an adverse health effect given the number of
pathogenic organisms ingested), then the whole risk analysis is jeopardised because the dose-response
is the link between all the exposure pathways and the amount of pathogen involved (often a big model)
and the size of human health impact that results. It is therefore rather useful to separate the kite
diagrams and other analyses into different categories for the level of dependence the analysis has on
each parameter: critical, important or small, for example.
A more sophisticated version for separating the level of dependence is statistically to analyse the degree
of effect each parameter has on the numerical result; for example, one might look at the difference in
the mean of the model output when the parameter distribution is replaced by its 95th and 5th percentile.
Taking that range and multiplying by (1 - the support score), giving 0 for excellent and 1 for terrible,
gives one a sense of the level of vulnerability of the output numbers. However, this method suffers
other problems. Imagine that we are performing a risk analysis on an emerging bacterium for which
we have absolutely no dose-response data, so we use a dataset for a surrogate bacterium that we
think will have a very similar effect (e.g. because it produces a similar toxin). We might have large
amounts of excellent data for the surrogate bacterium and may therefore have little uncertainty about the
dose-response model, so using the 5th and 95th percentiles of the uncertainty about that dose-response
model will result in a small change in the output and multiplying that by ( I - the support score) will
under-represent the real uncertainty. A second problem is that we often estimate two or more model
parameters from the same dataset; for example, a dose-response model often has two or three parameters

Chapter 3 The quality of a risk analysis

35

that are fitted to data. Each parameter might be quite uncertain, but the dose-response curve can be
nonetheless quite stable, so this numerical analysis needs to look at the combined effect of the uncertain
parameters as a single entity, which requires a fair bit of number juggling.

3.4 The Biggest Uncertainty in a Risk Analysis
The techniques discussed above have focused on the vulnerability of the results of a risk analysis to
the parameters of a model. When we are asked to review or audit a risk analysis, the client is often
surprised that our first step is not to look at the model mathematics and supporting statistical analyses,
but to consider what the decision questions are, whether there were a number of assumptions, whether
it would be possible to do the analysis a different (usually simpler, but sometimes more complex and
precise) way and whether this other way would give the same answers, and to see if there are any
means for comparing predictions against reality. What we are trying to do is see whether the structure
and scope of the analysis are correct. The biggest uncertainty in a risk analysis is whether we started
off analysing the right thing and in the right way.
Finding the answer is very often not amenable to any numerical technique because we will not have
any alternative to compare against. If we do, it might nonetheless take a great deal of effort to put
together an alternative risk analysis model, and a model audit is usually too late in the process to be
able to start again. A much better idea, in my view, is to get a sense at the beginning of a risk analysis
of how confident we should be that the analysis will be scoped sufficiently broadly, or how confident
we are that the world is adequately represented by our model. Needless to say, we can also start rather
confident that our approach will be quite adequate and then, once having delved into the details of
the problem, find out we were quite mistaken, so it is important to keep revisiting our view of the
appropriateness of the model.
We encourage clients, particularly in the scientific areas of risk in which we work, to instigate a solid
brainstorming session of experts and decision-makers whenever it has been decided that a risk analysis
is to be undertaken, or maybe is just under consideration. The focus is to discuss the form and scope
of the potential risk analysis. The experts first of all need to think about the decision questions, discuss
with decision-makers any possible alternatives or supplements to those questions and then consider how
they can be answered and what the outputs should look like (e.g. only the mean is required, or some
high percentile). Each approach will have a set of assumptions that need to be thought through carefully:
What would the effect be if the assumptions are wrong? If we use a conservative assumption and estimate a risk that is too high, are we back to where we started? We need to think about data requirements
too: Is the quality likely to be good and are the data easily attainable? We also need to think about
software. I was once asked to review a 2 year, $3million model written entirely in interacting C++
modules - nobody else had been able to figure it out (I couldn't either).
When the brainstorming is over, I recommend that you pass around a questionnaire to each expert
and ask those attending independently to answer a questionnaire something like this:

We discussed three risk analysis approaches (description A, description B, description
C). Please indicate your level of confidence (0 = none, 1 = slight, 2 =good, 3 =
excellent, -1 = no opinion) to the following:
1. What is your confidence that method A, B or C will be sufficiently flexible and
comprehensive to answer any foreseeable questions from the management about
this risk?

36

Risk Analysis

2. What is your confidence that method A, B or C is based on assumptions that are
correct?
3. What is your confidence for method A, B or C that the necessary data will be
available within the required timeframe and budget?
4. What is your confidence that the method A, B or C analysis will be completed in
time?
5 . What is your confidence that there will be strong support for method A, B or C
among reviewing peers?
6. What is your confidence that there will be strong support for method A, B or C
among stakeholders?

Asking each brainstorming participant independently will help you attain a balanced view, particularly
if the chairperson of that meeting has enforced the discipline of requiring participants not to express their
view on the above questions during the meeting (it won't be completely possible, but you are trying to
make sure that nobody will be influenced into giving a desired answer). Asking people independently
rather than trying to achieve consensus during the meeting will also help remove the overconfidence
that often appears when people make a group decision.

3.5 Iterate
Things change. The political landscape in which the decision is to be made can become more hostile
or accepting to some assumptions, data can prove better or worse than we initially thought, new data
turn up, new questions suddenly become important, the timeframe or budget can change, a risk analysis
consultant sees an early model and shows you a simpler way, etc.
So it makes sense to go back from time to time over the types of assumption analysis I discussed
in Sections 3.2 and 3.3 and to remain open to talung a different approach, even to making as dramatic
a change as going from a quantitative to a qualitative risk analysis. That means you (analysts and
decision-makers alike) should also be a little guarded in making premature promises so you have some
space to adapt. In our consultancy contracts, for example, a client will usually commission us to do a
quantitative risk analysis and tell us about the data they have. We'll probably have had a little look at
the data too. We prefer to structure our proposal into stages. In the first stage we go over the decision
problem, review any constraints (time, money, political, etc.), take a first decent look at the available
data and figure out possible ways of getting to the answer. Then we produce a report describing how
we want to tackle the problem and why. At that stage the client can stop the work, continue with us,
do it themselves or maybe hire someone else if they wish. It may take a little longer (usually a day
or two), but everyone's expectations are kept realistic, we aren't cornered into doing a risk analysis
that we know is inappropriate and clients don't waste their time or money. As consultants, we are in
the somewhat privileged position of turning down work that we know would be terrible. A risk analyst
employed by a company or government department may not have that luxury. If you, the reader, are
a risk analyst in the awkward position of being made to produce terrible risk analyses, perhaps you
should show your boss this chapter, or maybe check to see if we have any vacancies.

Chapter 4

Choice of model structure
There is a tendency to settle on the form that a risk analysis model will take too early on in the risk
analysis process. In part that will be because of a limited knowledge of the available options, but also
because people tend not to take a step back and ask themselves what the purpose of the analysis is, and
also how it might evolve over time. In this chapter I give a short guide to various types of model used
in risk analysis.

4.1 Software Tools and the Models they Build
4. I. I Spreadsheets
Spreadsheets, and by that I mean Excel these days, are the most natural and the first choice for most
people because it is perceived that relatively little additional knowledge is required to produce a risk
analysis model. Products like @RISK, CrystaLBall, ModelRisk and many other contenders for their
shared crown have made adding uncertainty into a spreadsheet as simple as cliclung a few buttons. You
can run a simulation and look at the distribution results in a few seconds and a few more button clicks.
Monte Carlo simulation software tools for Excel have focused very much on the graphical interfaces to
make risk analysis modelling easy: combine that with the ability to track formulae across spreadsheets,
imbed graphs and format sheets in many ways, and with VBA and data importing capabilities, and we
can see why Excel is so popular. I have even seen a whole trading floor run on Excel using VBA, and
not a single recognisable spreadsheet appeared on any dealer's screen.
But Excel has its limitations. ModelRisk overcomes many of them for high-level financial and insurance modelling, and I have used its features in this book a fair bit to help explain some modelling
concepts. However, there are many types of problem for which Excel is not suitable. Project cost and
schedule risk analysis can be done in spreadsheets at a crude level, which I cover in Chapter 19, and a
crude level is often enough for large-scale risk analysis, as we are rarely interested in the minutia that
can be built into a project planning model (like you might make with Primevera or Microsoft Project).
However, a risk register is better constructed in an electronic database with various levels of access.
The problem with building a project plan in a spreadsheet is that expanding the model into greater detail
becomes mechanically very awkward, while it is a simple matter in project planning software.
In other areas, risk analysis models with spreadsheets have a number of limitations:
1. They scale very badly, meaning that spreadsheets can become really huge when one has a lot of
data, or when one is performing repetitive calculations that could be succinctly written in another
language (e.g. a looping formula), although one can get round this to some degree with Visual
Basic. Our company reviews many risk models built in spreadsheets, and they can be vast, often
unnecessarily so because there are shortcuts to achieving the same result if one knows a bit of

38

2.

3.

4.

5.

Risk Analysis

probability mathematics. The next version of Excel will handle even bigger sheets, so I predict this
problem will only get worse.
They are limited to the two dimensions of a grid, three at a push if one uses sheets as a third
dimension; if you have a multidimensional problem you should really think hard before deciding on
a spreadsheet. There are a lot of other modelling environments one could use: C++ is highly flexible,
but opaque to anyone who is not a C++ programmer. Matlab and, to a lesser extent, Mathematica
and Maple are highly sophisticated mathematical modelling software with very powerful built-in
modelling capabilities that will handle many dimensions and can also perform simulations.
They are really slow. Running a simulation in Excel will take hundreds or more times longer than
specialised tools. That's a problem if you have a huge model, or if you need to achieve a high level
of precision (i.e. require many iterations).
Simulation models built in spreadsheets calculate in one direction, meaning that, if one acquires
new data that can be matched to a forecast in the model, the data cannot be integrated into the
model to update the estimates of parameters on which the model was based and therefore produce
a more accurate forecast. The simulation software WinBUGS can do this, and I give a number of
examples through this book.
Spreadsheets cannot easily handle modelling dynamic systems. There are a number of flexible
and user-friendly tools like Simul8 which give very good approximations to continuously varying
stochastic systems with many interacting components. I give an example later in this chapter.
Attempting to achieve the same in Excel is not worth the pain.

There are other types of model that one can build, and software that will let you do so easily, which
I describe below.

4.1.2 Influence diagrams
Influence diagrams are quite popular - they essentially replicate the mathematics you can build in
a spreadsheet, but the modelling environment is quite different (Figure 4.1 is a simple example).
~ n a l y t i c ais~ the most popular influence diagram tool. Variables (called nodes) are represented as
graphical objects (circles, squares, etc.) and are connected together with arrows (called arcs) which
show the direction of interaction between these variables. The visual result is a network that shows the
Project base cost

Total project cost

I

I

A

Additional
costs

,

<

Inflation

A
f

L

Figure 4.1

\

f

Risk of political
change

Risk of bad
weather
/

Risk of strike

Example of a simple influence diagram.

1

L

,

Chaater 4 Choice of model structure

39

viewer which variables affect which, but you can imagine that such a diagram quickly becomes overly
complex, so one builds submodels. Click on a model object and it opens another view to show a lower
level of interaction. Personally, I don't like them much because the mathematics and data behind the
model are hard to get to, but others love them. They are certainly very visual.

4.1.3 Event trees
Event trees offer a way to describe a sequence of probabilistic events, together with their probabilities
and impacts. They are perhaps the most useful of all the methods for depicting a probabilistic sequence,
because they are very intuitive, the mathematics to combine the probabilities is simple and the diagram
helps ensure the necessary discipline. Event trees are built out of nodes (boxes) and arcs (arrows)
(Figure 4.2).
The tree starts from the left with a node (in the diagram below, "Select animal" to denote the random
selection of an animal from some population), and arrows to the right indicate possible outcomes
(here, whether the animal is infected with some particular disease agent, or not) and their probabilities
(p, which would be the prevalence of infected animals in the population, and (1 - p) respectively).
Branching out from these boxes are arrows to the next probability event (the testing of an animal for
the disease), and attached to these arrows are the conditional probabilities of the next level of event
occurring. The conditional nature of the probabilities in an event tree is extremely important to underline.
In this example:
Se = P(test positive for disegse given the animal is infected)

-

Sp = P(test negative for disease given the animal is not infected)

Thus, following the rules of conditional probability algebra, we can say, for example:
P(anima1 is infected and tests positive) = p*Se

P(anima1 is infected and tests negative) = p*(l - Se)
P(anima1 tests positive) = p*Se

Probabilityof the
sequence

=lse/

/

-P

y

Not infected

Probabilityof this step

Figure 4.2

Tests +ve

Tests -ve

Select animal

4

+ (1 - p)*(l - SP)

Example of a simple event tree.

Probabilities conditional on
previous step

se
:

I

~ ( -1Se)

40

Risk Analysis

High revenue
Investment 1
Low revenue
I

Investment decision

I

High revenue
Investment 2
Low revenue
1

No investment

Figure 4.3 Example of a simple decision tree. The decision options are to make either of two investments
or do nothing with associated revenues as a result. More involved decision trees would include two or more
sequential decisions depending on how well the investment went.

Event trees are very useful for building up your probability thinking, although they will get quite
complex rather quickly. We use them a great deal to help understand and communicate a problem.

4.1.4 Decision trees
Decision trees are like event trees but add possible decision options (Figure 4.3). They have a role in
risk analysis, and in fields like petroleum exploration they are very popular. They sketch the possible
decisions that one might make and the outcomes that might result. Decision tree software (which can
also produce event trees) can calculate the best option to take under the assumption of some user-defined
utility function. Again, personally I am not a big fan of decision trees in actual model writing. I find that
it is difficult for decision-makers to be comfortable with defining a utility curve, so I don't have much
use for the analytical component of decision tree software, but they are helpful for communicating the
logic of a problem.

4.1.5 Fault trees
Fault trees start from the reverse approach to an event tree. An event tree looks forward from a starting
point and considers the possible future outcomes. A fault tree starts with the outcome and looks at
the ways it could have arisen. A fault tree is therefore constructed from the right with the outcome,
moves to the left with the possible immediate events that could have made that outcome arise, continues
backwards with the possible events that could have made the first set of events arise, etc.
Fault trees are very useful for focusing attention on what might go wrong and why. They have been
used in reliability engineering for a long time, but also have applications in areas like terrorism. For
example, one might start with the risk of deliberate contamination of a city's drinking water supply
and then consider routes that the terrorist could use (pipeline, treatment plant, reservoir, etc.) and the
probabilities of being able to do that given the security in place.

4.1.6 Discrete event simulation
Discrete event simulation (DES) differs from Monte Carlo simulation mainly in that it models the
evolution of a (usually stochastic) system over time. It does this by allowing the user to define equations

Chapter 4 Choice of model structure

41

for each element in the model for how it changes, moves and interacts with other elements. Then it
steps the system through small time increments and keeps track of where all elements are at any time
(e.g. parts in a manufacturing system, passengers in an airport or ships in a harbour). More sophisticated
tools can increase the clock steps when nothing is happening, then decrease again to get a more accurate
approximation to the continuous behaviour it is modelling.
We have used DES for a variety of clients, one of which was a shipping firm that regularly received
LNG-ships at its site on a narrow shared waterway. The client wanted to investigate the impact of
constructing an alternative berthing system designed to reduce the impact of their activities on other
shipping movements, and the model evaluated the benefits of such a system. Within the DES model,
movements of the client's and any other relevant shipping traffic were simulated, taking into account
restrictions of movements by certain rules and regulations and evaluating the costs of delays. The standalone model, as well as documentation and training, was provided to the client and helped them to
persuade the other shipping operators and the Federal Energy Regulatory Commission (FERC) of the
effectiveness of their plan.
Figure 4.4 shows a screen shot of the model (it looks better in colour). Going from left to right, we
can see that currently there is one ship in the upper harbour, four in the inner harbour, none at the city
front and one in the outer harbour. In the client's berth, two ships are unloading with 1330 and 2430
units of materials still on board. In the upper right-hand comer the number of ships entering the shared
waterway is visible, including the number of ships that are currently in a queue (three and two ships of
a particular type). Finally, the lower right-hand corner shows the current waterway conditions, which
dictate some of the rules such as "only ships of a certain draft can enter or exit the waterway given a
particular current, tide, wind speed and visibility".
DES allows us to model extremely complicated systems in a simple way by defining how the elements
interact and then letting the model simulate what might happen. It is used a great deal to model, for
example, manufacturing processes, the spr6ad of epidemics, all sorts of complex queuing systems, traffic

River Closed
to non-LNG

i:

Figure 4.4

Example of a DES model.

42

Risk Analysis

flows and crowd behaviour to design emergency exits. The beauty of a visual interface is that anyone
who knows the system can check whether it behaves as expected, which makes it a great communication
and validation tool.

4.2 Calculation Methods
Given a certain probability model that we wish to evaluate, there are several methods that we could use
to produce the required answer, which I describe below.

4.2.1 Calculating moments
This method uses some probability laws that are discussed later in this book. In particular it uses the
following rules:

I

I

1. The mean of the sum of two distributions is equal to the sum of their means, i.e. (a
and (a - b) = Z - b.

+ b) = Z + b

2. The mean of the product of two distributions is equal to the product of their means, i.e. (a . b ) =

a . b.

3. The variance of the sum of two independent distributions is equal to the sum of their variances, i.e.
V ( a b) = V ( a ) V ( b ) and V ( a - b ) = V ( a ) V ( b ) .
4. V ( n a ) = n 2 v ( a ) ,E
i = nZ, where n is some constant.

+

+

+

The moments calculation method replaces each uncertain variable with its mean and variance and
then uses the above rules to estimate the mean and variance of the model's outcome.
So, for example, three variables a , b and c have the following means and variances:

a
b
c
If the problem is to calculate 2a

Mean = 70
Mean = 16
Mean = 12

Variance = 14
Variance = 2
Variance = 4

+ b - c, the result can be estimated as follows:
Mean = (2 *70) + 16 - 12 = 144
Variance = (22* 14) + 2 + 4 = 62

These two values are then used to construct a normal distribution of the outcome:
Result = Normal(144,

a)

where &? is the standard deviation of the distribution which is the square root of the variance.
This method is useful in certain situations, like the summation of a large number of potential risks
and in the determination of aggregate distributions (Section 11.2). It does have some fairly severe
limitations - it cannot easily cope with divisions, exponents, power functions, branching, etc. In short,

Chapter 4 Choice of model structure

43

this technique becomes very difficult to execute for all but the most simple models that also reasonably
obey its set of assumptions.

4.2.2 Exact algebraic solutions
Each probability distribution has associated with it a probability distribution function that mathematically
describes its shape. Algebraic methods have been developed for determining the probability distribution
functions of some combinations of variables, so for simple models one may be able to find an equation
directly that describes the output distribution. For example, it is quite simple to calculate the probability
distribution function of the sum of two independent distributions (the following maths might not make
sense until you've read Chapter 6).
Let X be the first distribution with density f ( x ) and cumulative distribution function Fx(x), and let
Y be the second distribution with density g(x). Then the cumulative distribution function of the sum of
X and Y, Fx+y, is given by

The sum of two independent distributions is sometimes known as the convolution of the distributions.
By differentiating this equation, we obtain the density function of X Y:

+

So, for example, we can determine the distribution of the sum of two independent Uniform(0, 1)
distributions. The probability distribution functions f (x) and g(x) are both 1 for 0 5 x 5 1, and zero
otherwise. From Equation (4.2) we get

For 0 5 a 5 1, this yields

which gives fx+ (a) = a.
For 1 ( a ( 2, this yields

which is a Triangle(0, 1, 2) distribution.
Thus, if our risk analysis model was just the sum of several simple distributions, we could use these
equations repeatedly to determine the exact output distribution. There are a number of advantages to

44

Risk Analysis

this approach, for example: the answer is exact; one can immediately see the effect of changing a
parameter value; and one can use differential calculus to explore the sensitivity of the output to the
model parameters.
A variation of the same approach is to recognise the relationship between certain distributions. For
example:

There are plenty of such relationships, and many are described in Appendix 111, but nonetheless the
distributions used in a risk analysis model don't usually allow such simple manipulation and the exact
algebraic technique becomes hugely complex and often intractable very quickly, so it cannot usually be
considered as a practical solution.

4.2.3 Numerical approximations
Some fast Fourier transform and recursive techniques have been developed for directly, and very accurately, determining the aggregate distribution of a random number of independent random variables. A lot
of focus has been paid to this particular problem because it is central to the actuarial need to determine
the aggregate claim payout an insurance company will face. However, the same generic problem occurs
in banking and other areas. I describe these techniques in Section 11.2.2. There are other numerical
techniques that can solve certain types of problem, particularly via numerical integration. ModelRisk, for
example, provides the function VoseIntegrate which will perform a very accurate numerical integration.
Consider a function that relates the probability of illness, Pill(D), to the number of virus particles
ingested, D , as follows:

1

If we believed that the number of virus particles followed a Lognorma1(100,10) distribution, we could
calculate the probability of illness as follows:

Chapter 4 Choice of model structure

45

where the VoseIntegrate function interprets "#" to be the variable to integrate over and the integration
is done between 1 and 1000. The answer is 2.10217E-05 - a value that we could only determine with
accuracy using Monte Carlo simulation by running a large number of iterations.

4.2.4 Monte carlo simulation
This technique involves the random sampling of each probability distribution within the model to produce
hundreds or even thousands of scenarios (also called iterations or trials). Each probability distribution is
sampled in a manner that reproduces the distribution's shape. The distribution of the values calculated
for the model outcome therefore reflects the probability of the values that could occur. Monte Carlo
simulation offers many advantages over the other techniques presented above:
The distributions of the model's variables do not have to be approximated in any way.
Correlation and other interdependencies can be modelled.
The level of mathematics required to perform a Monte Carlo simulation is quite basic.
The computer does all of the work required in determining the outcome distribution.
Software is commercially available to automate the tasks involved in the simulation.
Complex mathematics can be included (e.g. power functions, logs, IF statements, etc.) with no extra
difficulty.
Monte Carlo simulation is widely recognised as a valid technique, so its results are more likely to
be accepted.
The behaviour of the model can be investiga2/ed with great ease.
Changes to the model can be made very quickly and the results compared with previous models.
Monte Carlo simulation is often criticised as being an approximate technique. However, in theory at
least, any required level of precision can be achieved by simply increasing the number of iterations in a
simulation. The limitations are in the number of random numbers that can be produced from a random
number generating algorithm and, more commonly, the time a computer needs to generate the iterations.
For a great many problems, these limitations are irrelevant or can be avoided by structuring the model
into sections.
The value of Monte Carlo simulation can be demonstrated by considering the cost model problem of
Figure 4.5. Triangular distributions represent uncertainty variables in the model. There are many other,
very intuitive, distributions in common use (Figure 4.6 gives some examples) that require little or no
probability knowledge to understand. The cumulative distribution of the results is shown in Figure 4.7,
along with the distribution of the values that are generated from running a "what if" scenario analysis
using three values as discussed at the beginning of this chapter. The figure shows that the Monte Carlo
outcome does not have anywhere near as wide a range as the "what if7' analysis. This is because the
"what if" analysis effectively gives equal probability weighting to all scenarios, including where all
costs turned out to be their maximum and all costs turned out to be their minimum. Let us allow,
for a minute, the maximum to mean the value that only has a 1 % chance of being exceeded (say).
The probability that all five costs could be at their maximum at the same time would equal (0.01)~or
1:10 000 000 000: not a realistic outcome! Monte Carlo simulation therefore provides results that are
also far more realistic than those that are produced by simple "what if" scenarios.

46

Risk Analysis

Total construction costs
Minimum

Best guess

Maximum

E 23 500
£172000
£ 56 200
£ 29 600

£

f 31 100

I E 30 500 1 E 33 200 1 E 37 800

Excavation
Foundations
Structure
Roofing
Services and finishes

27 200
E 178000
E
58 500
f 37 200

El89000
E 63 700
f 43 600

Figure 4.5 Construction project cost model.

b) Uniform distributions

a) Triangle distributions

0.12
0.1
0.08
0.06
0.04
0.02
0
0

20

40

0

60

10

5

15

20

25

d) A Relative distribution

c) PERT distributions

0.3PERT(0,49,50)

Relatwe(4,15,{7,9,11),(2,3,0.5))

0.250.20.150.1 0.050

0

20

40

I

0

60

10

15

20

f) A Discrete distribution

e) A Cumulative Ascending distribution

1.2-

5

0.6 -

Discrete({l,2.3),{0.4,0.5,0.1))

CurnulA(O,lO,{l,4,6),(0.2,0.5,0.6),1)

0.5 -

1 -

0.4 0.30.20.1 -

I

0
0
Figure 4.6

5

10

0

Examples of intuitive and simple probability distributions.

1

2

3

I

4

Chapter 4 Choice of model structure

47

1
0.9
0.8

Monte Carlo simulation

r
a,

0.7

s2 0.6

.h

.-a>,

"What-if" scenarios

0.5

C

2 0.4
a

5 0.3
0
0.2
0.1
0
£310000 £320000 £330000 £340000 £350000 £360000 £370000
Total project cost

Figure 4.7

Comparison of distributions of results from "what if" and risk analyses.

4.3 Uncertainty and Variability
"Variability is a phenomenon in the physical world to be measured, analysed and where
appropriate explained. By contrast, uncertainty is an aspect of knowledge".
Sir David Cox
There are two components of our inability to be able precisely to predict what the future holds: these
are variability and uncertainty. This is a difficult subject, not least because of the words that we risk
analysts have available to describe the various concepts and how these words have been used rather
carelessly. Bearing this in mind, a good start will be to define the meaning of various keywords. 1 have
used the now fairly standard meanings for uncertainty and variability, but might be considered to be
deviating a little from the common path in my explanation of the units of uncertainty and variability. The
reader should bear in mind the comments I'll make about the different meanings that various disciplines
assign to certain words. As long as the reader manages to keep the concepts clear, it should be an easy
enough task to work out what another author means even if some of the terminology is different.
Variability
Variability is the effect of chance and is a function of the system. It is not reducible through either
study or further measurement, but may be reduced by changing the physical system. Variability has
been described as "aleatory uncertainty", "stochastic variability" and "interindividual variability".
Tossing a coin a number of times provides us with a simple illustration of variability. If I toss the
coin once, I will have a head (H) or tail (T), each with a probability of 50% if one presumes a fair
coin. If I toss the coin twice, I have four possible outcomes {HH, HT, TH, TT}, each with a probability
of 25 % because of the coin's symmetry. We cannot predict with certainty what the tosses of a coin will
produce because of the inherent randomness of the coin toss.

48

Risk Analysis

The variation among a population provides us with another simple example. If I randomly select
people off the street and note some physical characteristic, like their height, weight, sex, whether they
wear glasses, etc., the result will be a random variable with a probability distribution that matches the
frequency distribution of the population from which I am sampling. So, for example, if 52 % of the
population are female, a randomly sampled person will be female with a probability of 52 %.
In the nineteenth century a rather depressing philosophical school of thought, usually attributed to the
mathematician Marquis Pierre-Simon de Laplace, became popular, which proposed that there was no
such thing as variability, only uncertainty, i.e. that there is no randomness in the world and an omniscient being or machine, a "Laplace machine", could predict any future event. This was the foundation
of the physics of the day, Newtonian physics, and even Albert Einstein believed in determinism of the
physical world, saying the often quoted "Der Herr Gott wurfelt nicht" - "God does not play dice".
Heisenberg's uncertainty principle, one of the foundations of modern physics and, in particular, quantum mechanics, shows us that this is not true at the molecular level, and therefore subtly at any greater
scale. In essence, it states that, the more one characteristic of a particle is constrained (for example, its
location in space), the more random another characteristic becomes (if the first characteristic is location,
the second will be its velocity). Einstein tried to prove that it is our knowledge of one characteristic
that we are losing as we gain knowledge of another characteristic, rather than any characteristic being
a random variable, but he has subsequently been proven wrong both theoretically and experimentally.
Quantum mechanics has so far proved itself to be very accurate in predicting experimental outcomes
at the molecular level where the predictable random effects are most easily observed, so we have a lot
of empirical evidence to support the theory. Philosophically, the idea that everything is predetermined
(i.e. the world is deterministic) is very difficult to accept too, as it deprives us humans of free will.
The non-existence of free will would in turn mean that we are not responsible for our actions - we are
reduced to complicated machines and it is meaningless to be either praised or punished for our deeds
and misdeeds, which of course is contrary to the principles of any civilisation or religion. Thus, if one
accepts the existence of free will, one must also accept an element of randomness in all things that
humans affect. Popper (1988) offers a fuller discussion of the subject.
Sometimes systems are too complex for us to understand properly. For example, stock markets produce varying stock prices all the time that appear random. Nobody knows all the factors that influence
a stock price over time - it is essentially infinitely complex and we accept that this is best modelled as
a random process.
Uncertainty

Uncertainty is the assessor's lack of knowledge (level of ignorance) about the parameters that characterise the physical system that is being modelled. It is sometimes reducible through further measurement
or study, or by consulting more experts. Uncertainty has also been called "fundamental uncertainty",
"epistemic uncertainty" and "degree of belief". Uncertainty is by definition subjective, as it is a function of the assessor, but there are techniques available to allow one to be "objectively subjective".
This essentially amounts to a logical assessment of the information contained in available data about
model parameters without including any prior, non-quantitative information. The result is an uncertainty
analysis that any logical person should agree with, given the available information.
Total uncertainty

Total uncertainty is the combination of uncertainty and variability. These two components act together
to erode our ability to be able to predict what the future holds. Uncertainty and variability are philosophically very different, and it is now quite common for them to be kept separate in risk analysis

Chapter 4 Choice of model structure

49

modelling. Common mistakes are failure to include uncertainty in the model, or modelling variability
in some parts of the model as if it were uncertainty. The former will provide an overconfident (i.e.
insufficiently spread) model output, while the latter can grossly overinflate the total uncertainty.
Unfortunately, as you will have gathered, the term "uncertainty" has been applied to both the meaning
described above and total uncertainty, which has left the risk analyst with some problems of terminology.
Colleagues have suggested the word "indeterminability" to describe total uncertainty (perhaps a bit of
a mouthful, but still the best suggestion I've heard so far). There has been a rather protracted argument
between traditional (frequentist) and Bayesian statisticians over the meaning of words like probability,
frequency, confidence, etc. Rather than go through their various interpretations here, I will simply present
you with how I use these words. I have found my terminology helps clarify my thoughts and those of
my clients and course participants very well. I hope they will do the same for you.
Probability

Probability is a numerical measurement of the likelihood of an outcome of some stochastic process.
It is thus one of the two components, along with the values of the possible outcomes, that describe
the variability of a system. The concept of probability can be developed neatly from two different
approaches. The frequentist approach asks us to imagine repeating the physical process an extremely
large number of times (trials) and then to look at the fraction of times that the outcome of interest
occurs. That fraction is asymptotically (meaning as we approach an infinite number of trials) equal to
the probability of that particular outcome for that physical process. So, for example, the frequentist
would imagine that we toss a coin a very large number of times. The fraction of the tosses that comes
up heads is approximately the true probability of a single toss producing a head, and, the more tosses we
do, the closer the fraction becomes to the true probability. So, for a fair coin we should see the number
of heads stabilise at around 50 % of the trials as the number of trials gets truly huge. The philosophical
problem with this approach is that one usually does not have the opportunity to repeat the scenario a
very large number of times.
The physicist or engineer, on the other hand, could look at the coin, measure it, spin it, bounce lasers
off its surface, etc., until one could declare that, owing to symmetry, the coin must logically have a
50 % probability of falling on either surface (for a fair coin, or some other value for an unbalanced coin
as the measurements dictated).
Probability is used to define a probability distribution, which describes the range of values the variable
may take, together with the probability (likelihood) that the variable will take any specific value.
Degree of uncertainty

In this context, "degree of uncertainty" is our measure of how much we believe something to be true.
It is one of the two components, along with the plausible values of the parameter, that describe the
uncertainty we may have about the parameter of the physical system ("the state of nature", if you
like) to be modelled. We can thus use the degree of uncertainty to define an uncertainty distribution,
which describes the range of values within which we believe the parameter lies, as well as the level
of confidence we have about the parameter being any particular value, or lying within any particular
range. A distribution of confidence looks exactly the same as a distribution of probability, and this can
lead, all to easily, to confusion between the two quantities.
Frequency

Frequency is the number of times a particular characteristic appears in a population. Relative frequency
is the fraction of times the characteristic appears in the population. So, in a population of 1000 people,

SO

Risk Analysis

22 of whom have blue eyes, the frequency of blue eyes is 22 and the relative frequency is 0.022 or
2.2 %. Frequency, by the definition used here, must relate to a known population size.

4.3.1 Some illustrations of uncertainty and variability
Let us look at a couple of examples to clarify the meaning of uncertainty and variability. Since variability
is the more fundamental concept, we'll deal with it first. If I toss a fair coin, there is a 50 % chance that
each toss will come up heads (let's call this a "success"). The result of each toss is independent of the
results of any previous tosses, and it turns out that the probability distribution of the number of heads
in n tosses of a fair coin is described by a Binomial(n, 50%) distribution, which will be explained in
detail in Section 8.2. Figure 4.8 illustrates this binomial distribution for n = 1, 2, 5 and 10. This is a
distribution of variability because I am not a machine, so I am not perfectly repetitive, and the system
(the number of times the coin spins, the air resistance and movement, the angle at which it hits the
ground, the topology of the ground, etc.) is too complicated for me to attempt to influence the outcome,
and the tosses are therefore random.
These binomial distributions are distributions of variability and reflect the randomness inherent in
the tossing of a coin (our stochastic system). We are assuming that there is no uncertainty here, as
we are assuming the coin to be fair and we are defining the number of tosses; in other words, we
are assuming the parameters of the system to be exactly known. The vertical axis of Figure 4.8 gives
the probability of each result, and, naturally, these probabilities add up to 1. In general, probability
distributions or distributions of variability are simple to understand. They give me some comfort that

Binomial(l,0.5)

2 0.4

%

0.3

g

0.2

P

0.1

0.5
0

5
i
successes

0

1

0

2

1
Successes

Binomial(S,0.5)

4

5
Q.

0.25
0.2
0.15
0.1
0.05
0

0

1

Successes
2
3

4

5

0

Figure 4.8 Examples of the Binomial(n, 50 %) distribution.

L*.

1

2

3

4

5 6 7
Successes

8

9

1

0

Chapter 4 Choice of model structure

5I

0.6 -a 05-0.4 --

U

P
.2

0.3--

8

0.2-0.1 --

I

01

I

1

<

O

i

Probability

Figure 4.9 Confidence distributions for the ball in the box being black: 0 = No, 1 = Yes. The left panel is
confidence before any ball is revealed; the right panel is confidence after seeing a blue ball removed from
the sack.

randomness (variability) really does exist in the world: if we take a group of 100 people1 and ask
them to toss a coin 10 times, the resulting distribution of the number of heads will closely follow a
Binomial(l0, 50 %).
look at a distribution Of uncertainty. Imagine I have a sack of 10 balls, six of which are
black and the remaining four of which are blue, and I know these figures.
imagine that, out of my
sight, a
is randomly selected from the sack and placed in an opaque box. I am asked the question:
"What is the probability that the ball in the box is black?', and I could quickly answer 6/10 or 60 %.
another ball is removed from the sack and shown to me: it is blue. I am asked: yqow what is
the probability that the ball in the box is black?", and, as there are now a total of nine balls I have not
seen, six
which I know are black, I could answer 619 or 66.66 %. But that is strange, because it is
to believe that the probability of the ball in the box being black has changed from events that
after its selection. The problem lies in my use of the term "probability" which is inconsistent
with the definition I have given above. When the ball has been placed in the box, the deed is done,
the
in the box is black (i.e. the probability is 1) or it is not (i.e. the probability is 0). I don't
know the truth but could collect information (i.e. look in the box or look in the sack) to find out what
the true
is. Before any ball was revealed to me, I should have said that I was 60 % confident that
the probability was 1, and therefore 40 % confident that the probability was 0. hi^ is an uncertainty
of the true probability. NOW,when the blue ball was revealed to me from the sack I had
extra inf~rmationand would therefore change my uncertainty distribution to show a 66.66 % confidence
that the
in the box was black 6.e. that the probability was 1). These two confidence distributions
are shown in Figure 4.9. Note that the distributions of Figure 4.9 are pure uncertainty distributions. ~h~
--.--\.-.--...
-.-.-,-- ci
4 0
1%
\
L. .
....-.+\
N C oarr
'
"'
-+.?-.-xlLD'
w&
are shown in Figure 4.9. Note t at
and a
is probability but may only take a value of O and I'
has no variability: its outcome is dete-nistic.

-

.--\

-

l.I.2

Lr11; ""'""

a%%%'*%

unce*ainq
and ~ariabilicl
in a risk

intents and puvOses' look and
unce*ainty and variabi.ty are described by distributions that?
In
behave eirctly the same One might therefore reasonabl~conciude lhat they can be
me sme Monte Carlo model: some dishbutions reflecting the unce*ainty about certain parameters ln

-

, I don,t remmmend the expe.ment,

l-he coins went everywhere,

done his a couple of timer with a k g lecture group

in a large banked auditonurn'

the model, the other distributions reflecting the inherent stochastic nature of the system. We could then
run a simulation on such a model which would randomly sample from all the distributions and our
output would therefore take account of all uncertainty and variability. Unfortunately, this does not work
out completely. The resultant single distribution is equivalent to our "best-guess" distribution of the
composite of the two components. Technically, it is difficult to interpret, as the vertical scale represents
neither uncertainty nor variability, and we have lost some information in knowing what component
of the resultant distribution is due to the inherent randomness (variability) of the system, and what
component is due to our ignorance of that system. It is therefore useful to know how to keep these two
components separate in an analysis if necessary.
Why separate uncertainty and variability?

Keeping uncertainty and variability separate in a risk analysis model is mathematically more correct.
Mixing the two together, i.e. by simulating them together, produces a reasonable estimate of the level
of total uncertainty under most conditions. Figure 4.10 shows a Binomial(l0, p) distribution, where p
is uncertain with distribution Beta(l0, 10). The spaghetti-looking graph represents a number of possible
true binomial distributions, shown in cumulative form, and the bold line shows the result one gets
from simulating the binomial and beta distributions together. The combined model may be wrong, but
it covers the possible range very well. But consider doing the same with just one binomial trial, e.g.
Binomial(1, Beta(10,lO)). The result is either a 1 or a 0, each occurring in about 50 % of the simulation
run, the same result as we would have had by modelling Binomial(1, 50%). The output has lost the
information that p is uncertain.
Mixing uncertainty and variability means, of course, that we cannot see how much of the total
uncertainty comes from variability and how much from uncertainty, and that information is useful.
If we know that a large part of the total uncertainty is due to uncertainty (as in the example of
Figure 4.1 I), then we know that collecting further information, thereby reducing uncertainty, would
enable us to improve our estimate of the future. On the other hand, if the total uncertainty is nearly all
due to variability (as in the example of Figure 4.12), we know that it is a waste of time to collect more

0

1

2

3

4

5

6

7

8

9

1

0

Successes
Figure 4.10 300 Binomial(l0, p) distributions resulting from random samples of p from a Beta(l0, 10)
distribution.

Chapter 4 Choice of model structure

53

I
0.9

-.-2 0.8

5 0.7

2I! 0.6
0.5

.-5 0.4

-z 0.3
3

0 0.2

0.1
0
0

20

40

60

80

100
120
Lifetime

140

160

180

200

Figure 4.11 Example of second-order risk analysis model output with uncertainty dominating variability.

1

0.9

.--2
2

n

2

0.8
0.7
0.6

P

a, 0.5

.->

-

z

0.4
0.3

3

0 0.2

0.1
0

Figure 4.12

0

20

40

60

80

100
120
Lifetime

140

160

180

200

Example of second-order risk analysis model output with variability dominating uncertainty.

information and the only way to reduce the total uncertainty would be to change the physical system.
In general, the separation of uncertainty and variability allows us to understand what steps can be taken
to reduce the total uncertainty of our model, and allows us to gauge the value of more information or
of some potential change we can make to the system.
A much larger problem than mixing uncertainty and variability distributions together can occur when
a variability distribution is used as if it were an uncertainty distribution. Separating uncertainty and
variability very deliberately gives us the discipline and understanding to avoid the much larger errors
that this mistake will produce. Consider the following problem.

54

I

I

1

Risk Analysis

A group of 10 jurors is randomly picked from a population for some court case. In this population,
50 % are female, 0.2 % have severe visual disability and 1.1 % are Native American. The defence would
like to have at least one member on the jury who is female and either Native American or visually
disabled or both. What is the probability that there will be at least one such juror in the selection? This is a
pure variability problem, as all the parameters are considered well known and the answer is quite easy to
calculate, assuming independence between the characteristics. The probability that a person is not Native
American and not visually disabled is (100 % - 1.1 %) (100 % - 0.2 %) = 98.7022 %. The probability
that a person is either Native American or visually disabled or both is (100 % - 98.7022 %) = 1.2978 %.
Thus, the probability that a person is either Native American or visually disabled or both and female
is (50 % * 1.2978 %) = 0.6489 %. The probability that none of the potential jurors is either Native
American or visually disabled or both and female is then (100% - 0.6489 %)lo = 93.697 %, and so,
finally, the probability that at least one potential juror is either Native American or visually disabled or
both and female is (100 % - 93.697 %) = 6.303 %.
Now let's compare this calculation with the spreadsheet of Figure 4.13 and the result it produces in
Figure 4.14. In this model, the number of females in the jury has been simulated, but the rest of the
calculation has been explicitly calculated. The output thus has a distribution that is meaningless since it
should be a single figure. The reason for this is that the model both calculated and simulated variability.
We are treating the number of females as if it were an uncertain parameter rather than a variable.
Now, having said how useful it is to separate uncertainty and variability, we must take a step back
and ask whether the effort is worth the extra information that can be gained. In truth, if we run
simulations that combine uncertainty and variability in the same simulation, we can get a good idea
of their contribution to total uncertainty by running the model twice: the first time sampling from all
distributions, and the second time setting all the uncertainty distributions to their mean value. The
difference in spread is a reasonable description of the contribution of uncertainty to total uncertainty.
Writing a model where uncertainty and variability are kept separate, as described in the next section,

*

Figure 4.13 Example of model that incorrectly mixes uncertainty and variability.

Chapter 4 Choice of model structure

Figure 4.14

55

Result of the model of Figure 4.13.

can be very time consuming and cumbersome, so we must keep an eye out for the value of such an
exercise.

4.3.3 Structuring a Monte Carlo model to separate uncertainty and
variability
The core structure of a risk analysis model is the variability of the stochastic system. Once this variability
model has been constructed, the uncertainty about parameters in that variability model can be overlaid. A
risk analysis model that separates uncertainty and variability is described as second order. A variability
model comes in two forms: explicit calculation and simulation. In a variability model with explicit
calculation, the probability of each possible outcome is explicitly calculated. So, for example, if one
were calculating the number of heads in 10 tosses of a coin, the explicit calculation model would take the

Figure 4.15

Model calculating the outcome of 10 tosses of a coin.

56

Risk Analysis

form of the spreadsheet in Figure 4.15. Here, we have used the Excel function BINOMDIST(x, n, p,
cumulative) which returns the probability of x successes in n trials with a binomial probability of
success p . The cumulative parameter requires either a TRUE (or 1) or a FALSE (or 0): using TRUE,
the function returns the cumulative probability F(x), using FALSE the function returns the probability
mass f (x). Plotting columns E and F together in an x-y scatter plot produces the binomial distribution
which can be the output of the model. Statistical results, like the mean and standard deviation shown in
the spreadsheet model, can also be determined explicitly as needed. The formulae calculating the mean
and standard deviation use an Excel array function SUMPRODUCT which multiplies terms in the two
arrays pair by pair and then sums these pair products. In an explicitly calculated model like this it is
a simple matter to include uncertainty about any parameters of the model. For example, if we are not
confident that the coin was truly fair but instead wish to describe our estimate of the probability of
heads as a Beta(l2, 11) distribution (see Section 8.2.3 for explanation of the beta distribution in this
context), we can simply enter the beta distribution in place of the 0.5 value in cell C3 and simulate for
the cells in column F containing the outputs.
The separation of uncertainty and variability is simple and clear when using a model that explicitly
calculates the variability, as we use formulae for the variability and simulation for the uncertainty. But
what do we do if the model is set up to simulate the variability? Figure 4.16 shows the same coin tossing
problem, but now we are simulating the number of heads using a Binomial(n, p ) function in @RISK.
Admittedly, it seems rather unnecessary here to simulate such a simple problem, but in many circumstances it is extremely unwieldy, if not impossible, to use explicit calculation models, and simulation is
the only feasible approach. Since we are using the random sampling of simulation to model the variability, it is no longer available to us to model uncertainty. Let us imagine that we put a possible value for the
binomial probability p into the model and run a simulation. The result is the binomial distribution that
would be the correct model of variability if that value of p were correct. Now, we believe that p could
actually be quite a different value - our confidence about the true value of p is described by a Beta(l2,
11) distribution - so we would really like to take repeated samples from the beta distribution, run a
simulation for each sample and plot all the binomial distributions together to give us a true picture. This
sounds immensely tedious, but @RISK provides a RiskSimtable function that will automate the process.
Crystal Ball also provides a similar facility in its Pro version that allows one to nominate uncertainty
and variability distributions within a model separately and then completely automates the process.
We proceed by taking (say 50) Latin hypercube samples from the beta distribution, then import them
back into the spreadsheet model. We then use a RiskSimtable function to reference the list of values.

Figure 4.16 A simulation version of the model of Figure 4.15.

Chapter 4 Choice o f model structure

57

The RiskSimtable function returns the first value in the list, but when we instruct @RISK to run 50
simulations, each of say 500 iterations, the RiskSimtable function will go through the list, using one
value at a time for each simulation. Note that the number of simulations is set to equal the number of
samples we have from the beta uncertainty distribution. The binomial distribution is then linked to the
RiskSimtable function and named as an output. We now run the 50 simulations and produce 50 different
possible binomial distributions which can be plotted together and analysed in much the same way as an
explicit calculation output. Of course, there are an infinite number of possible binomial distributions,
but, by using Latin hypercube sampling (see Section 4.4.3 for an explanation of the value of Latin
hypercube sampling), we are ensuring that we get a good representation of the uncertainty with a few
simulations.
In spite of the automation provided by the RiskSimtable function in @RISK or the facilities of Crystal
Ball Pro and the speed of modern computers, the simulations can take some time. However, in most
non-trivial models that time is easily balanced by the reduction in complexity of the model itself and
therefore the time it takes to construct, as well as the more intuitive manner in which the models can
be constructed which greatly helps avoiding errors.
The ModelRisk software makes uncertainty analysis much easier, as all its fitting functions offer
the option of either returning best fitting parameters (or distributions, time series, etc., based on best
fitting parameters), which is more common practice, or including the statistical uncertainty about those
parameters, which is more correct.

4.4 How Monte Carlo Simulation Works
This section looks at the technical aspects of how Monte Carlo risk analysis software generates random
samples for the input distributions of a model. The difference between Monte Carlo and Latin hypercube
sampling is explained. An illustration of the improvement in reliability and efficiency of Latin hypercube
sampling over Monte Carlo is also presented. The use of a random number generator seed is explained,
and the reader is shown how it is possible to generate probability distributions of one's own design.
Finally, a brief introduction is given into the methods used by risk analysis software to produce rank
order correlation of input variables.

4.4.1 Random sampling from input distributions
Consider the distribution of an uncertain input variable x. The cumulative distribution function F(x),
defined in Chapter 6.1.1, gives the probability P that the variable X will be less than or equal to x, i.e.

F(x) obviously ranges from 0 to 1. Now, we can look at this equation in the reverse direction: what is
the value of F(x) for a given value of x? This inverse function G(F(x)) is written as

It is this concept of the inverse function G(F(x)) that is used in the generation of random samples
from each distribution in a risk analysis model. Figure 4.17 provides a graphical representation of the
relationship between F (x) and G ( F (x)).

58

Risk Analysis

Figure 4.17 The relationship between x, F(x) and G(F(x)).

To generate a random sample for a probability distribution, a random number r is generated between
0 and 1. This value is then fed into the equation to determine the value to be generated for the
distribution:

The random number r is generated from a Uniform(0, 1) distribution to provide equal opportunity
of an x value being generated in any percentile range. The inverse function concept is employed in
a number of sampling methods, discussed in the following sections. In practice, for some types of
probability distribution it is not possible to determine an equation for G(F(x)), in which case numerical
solving techniques can be employed.
ModelRisk uses the inversion method for all of its 70+ families of univariate distributions and allows
the user to control how the distribution is sampled via its "U-parameter". For example:
VoseNormal(mu, sigma, U)
where mu and sigma are the mean and standard deviation of the normal distribution;
VoseNormal(mu, sigma, 0.9)
returns the 90th percentile of the distribution;
VoseNormal(mu, sigma)

VoseNormal(mu, sigma, RiskUniform(0, 1))

Chapter 4 Cho~ceof model structure

59

for @RISK users or
VoseNormal(mu, sigma, CB.Uniform(0, 1))

I

I

for Crystal Ball users, etc., returns random samples from the distribution that are controlled by ModelRisk, @RISK or Crystal Ball respectively. The inversion method also allows us to make use of copulas
to correlate variables, as explained in Section 13.3.

4.4.2 Monte Carlo sampling
Monte Carlo sampling uses the above sampling method exactly as described. It is the least sophisticated
of the sampling methods discussed here, but is the oldest and best known. Monte Carlo sampling got
its name as the code word for work that von Neumann and Ulam were doing during World War I1
on the Manhattan Project at Los Alamos for the atom bomb, where it was used to integrate otherwise
intractable mathematical functions (Rubinstein, 1981). However, one of the earliest examples of the use
of the Monte Carlo method was in the famous Buffon's needle problem where needles were physically
thrown randomly onto a gridded field to estimate the value of n. At the beginning of the twentieth
century the Monte Carlo method was also used to examine the Boltzmann equation, and in 1908 the
famous statistician Student (W. S. Gossett) used the Monte Carlo method for estimating the correlation
coefficient in his t-distribution.
Monte Carlo sampling satisfies the purist's desire for an unadulterated random sampling method.
It is useful if one is trying to get a model to imitate a random sampling from a population or for
doing statistical experiments. However, the randomness of its sampling means that it will over- and
undersample from various parts of the distribution and cannot be relied upon to replicate the input
distribution's shape unless a very large number of iterations are performed.
For nearly all risk analysis modelling, the pure randomness of Monte Carlo sampling is not really
relevant. We are almost always far more concerned that the model will reproduce the distributions that
we have determined for its inputs. Otherwise, what would be the point of expending so much effort on
getting these distributions right? Latin hypercube sampling addresses this issue by providing a sampling
method that appears random but that also guarantees to reproduce the input distribution with much
greater efficiency than Monte Carlo sampling.

4.4.3 Latin Hypercube sampling
Latin hypercube sampling, or LHS, is an option that is now available for most risk analysis simulation
software programs. It uses a technique known as "stratified sampling without replacement" (Iman,
Davenport and Zeigler, 1980) and proceeds as follows:
The probability distribution is split into n intervals of equal probability, where n is the number of
iterations that are to be performed on the model. Figure 4.18 illustrates an example of the stratification
that is produced for 20 iterations of a normal distribution. The bands can be seen to get progressively
wider towards the tails as the probability density drops away.
In the first iteration, one of these intervals is selected using a random number.
A second random number is then generated to determine where, within that interval, F ( x ) should
lie. In practice, the second half of the first random number can be used for this purpose, reducing
simulation time.

IH
W

I

Figure 4.18

Example of the effect of stratification in Latin hypercube sampling.

x = G ( F (x)) is calculated for that value of F (x).
The process is repeated for the second iteration, but the interval used in the first iteration is marked
as having already been used and therefore will not be selected again.
This process is repeated for all of the iterations. Since the number of iterations n is also the number
of intervals, each interval will only have been sampled once and the distribution will have been
reproduced with predictable uniformity over the F(x) range.
The improvement offered by LHS over Monte Carlo can be easily demonstrated. Figure 4.19 compares the results obtained by sampling from a Triangle(0, 10, 20) distribution with LHS and Monte
Carlo sampling. The top panels of Figure 4.19 show histograms of the triangular distribution after one
simulation of 300 iterations. The LHS clearly reproduces the distribution much better. The middle panels
of Figure 4.19 show an example of the convergence of the two sampling techniques to the true values
of the distribution's mean and standard deviation. In the Monte Carlo test, the distribution was sampled
50 times, then another 50 to make 100, then another 100 to make 200, and so on to give simulations of
50, 100, 200, 300, 500, 1000 and 5000 iterations. In the LHS test, seven different simulations were run
for the seven different numbers of iterations. The difference between the approaches was taken because
the LHS has a "memory" and the Monte Carlo sampling does not. A "memory" is where the sampling
algorithm takes account of from where it has already sampled in the distribution. From these two panels, one can get the feel for the consistency provided by LHS. The bottom two panels provide a more
general picture. To produce these diagrams, the triangular distribution was sampled in seven separate
simulations again with the following number of iterations: 50, 100, 200, 300, 500, 1000 and 5000 for
both LHS and Monte Carlo sampling. This was repeated 100 times and the mean and standard deviation
of the results were noted. The standard deviations of these statistics were calculated to give a feel for

Chapter 4 Choice of model structure

I

Lattn Hypercube sampllng

0.12

-

0.1

-.

Monte Carlo sampling

O1'T

0.08 --

0.06

--

0

10.1

5

10

15

-

-

10 --

,'
,I, .v I
,
,
9.9 -.
I

,.-----

,,'

C

g

9.8

-.

9.7

--

9.6

--

,

...,

'

95

---- Monte Carlo
-Lat~n

Hypercube
I

9

50

100

200

300
500
Iterations

1000

5000

8
s

0.35

t

5

8

1

03
"2,

0

L

0.2

ij

0 15

.1

0 1

.z
$

!"

OU5

0

50

100

200
300
500
Number of iterations

1000

5000

Figure 4.19 Comparison of the performance of Monte Carlo and Latin hypercube sampling.

20

61

62

Risk Analysis

1.I2-

- - - - LH sampling

- MC sampling
............. True mean

0.94 -0.92 s
0

"

4

.

I

20

40
60
80
Iterations completed

100

120

Figure 4.20 Example comparison of the convergence of the mean for Monte Carlo and Latin hypercube
distributions.

how much the results might naturally vary from one simulation to another. LHS consistently produces
values for the distribution's statistics that are nearer to the theoretical values of the input distribution
than Monte Carlo sampling. In fact, one can see that the spread in results using just 100 LHS samples
is smaller than the spread using 5000 MC samples!
The benefit of LHS is eroded if one does not complete the number of iterations nominated at the
beginning, i.e. if one halts the program in mid-simulation. Figure 4.20 illustrates an example where a
Normal(], 0.1) distribution is simulated for 100 iterations with both Monte Carlo sampling and LHS.
The mean of the values generated has roughly the same degree of variance from the true mean of 1
until the number of iterations completed gets close to the prescribed 100, when LHS pulls more sharply
in to the desired value.

4.4.4 Other sampling methods
There are a couple of other sampling methods, and I mention them here for completeness, although they
do not appear very often and are not offered by the standard risk analysis packages. Mid-point LHS is a
version of standard LHS where the mid-point of each interval is used for the sampling. In other words,
the data points (xi) generated from a distribution using n iterations will be at the ( i - 0.5)ln percentiles.
Mid-point LHS will produce even more precise and predictable values for the output statistics than
LHS, and in most situations it would be very useful. However, there are the odd occasions where
its equidistancing between the F(x) values causes interference effects that would not be observed in
standard LHS.
In certain problems, one might only be concerned with the extreme tail of the distribution of possible
outcomes. In such cases, even a very large number of iterations may fail to produce many sufficient
values in the extreme tail of the output for an accurate representation of the area of interest. It can
then be useful to employ importance sampling (Clark, 1961) which artificially raises the probability of
sampling from the ranges within the input distributions that would cause the extreme values of interest
in the output. The accentuated tail of the output distribution is rescaled back to its correct probability
density at the end of the simulation, but there is now good detail in the tail. In Section 4.5.1 we will

I

Chapter 4 Cho~ceof model structure

63

look at another method of simulation that ensures that one can get sufficient detail in the modelling of
rare events.
Sob01 numbers are non-random sequences of numbers that progressively fill in the Latin hypercube
space. The advantage they offer is that one can keep adding more iterations and they keep filling gaps
previously left. Contrast that with LHS for which we need to define the number of iterations at the
beginning of the simulation and, once it is complete, we have to start again - we can't build on the
sampling already done.

4.4.5 Random number generator seeds
There are many algorithms that have been developed to generate a series of random numbers between
0 and 1 with equal probability density for all possible values. There are plenty of reviews you can
find online. The best general-purpose algorithm is currently widely held to be the Mersenne twister.
These algorithms will start with a value between 0 and 1, and all subsequent random numbers that are
generated will rely on this initial seed value. This can be very useful. Most decent risk analysis packages
now offer the option to select a seed value. I personally do this as a matter of course, setting the seed
to 1 (because I can remember it!). Providing the model is not changed, and that includes the position
of the distributions in a spreadsheet model and therefore the order in which they are sampled, the same
simulation results can be exactly repeated. More importantly, one or more distributions can be changed
within the model and a second simulation run to look at the effect these changes have on the model's
outputs. It is then certain that any observed change in the result is due to changes in the model and not
a result of the randomness of the sampling.
i

4.5 Simulation Modelling
My cardinal rule of risk analysis modelling is: "Every iteration of a risk analysis model must be a
scenario that could physically occur". If the modeller follows this "cardinal rule", he or she has a much
better chance of producing a model that is both accurate and realistic and will avoid most of the problems
I so frequently encounter when reviewing a client's work. Section 7.4 discusses the most common risk
modelling errors.
A second very useful rule is: "Simulate when you can't calculate". In other words, don't simulate when
it is possible and not too onerous to determine exactly the answer directly through normal mathematics.
There are several reasons for this: simulation provides an approximate answer and mathematics can
give an exact answer; simulation will often not be able to provide the entire distribution, especially at
the low probability tails; mathematical equations can be updated instantaneously in light of a change in
the value of a parameter; and techniques like partial differentiation that can be applied to mathematical
equations provide methods to optimise decisions much more easily than simulation. In spite of all these
benefits, algebraic solutions can be excessively time consuming or intractable for all but the simplest
problems. For those who are not particularly mathematically inclined or trained, simulation provides an
efficient and intuitive approach to modelling risky issues.

4.5.1 Rare events
It is often tempting in a risk analysis model to include very unlikely events that would have a very
large impact should they occur; for example, including the risk of a large earthquake in a cost model

64

Risk Analysis

of a Sydney construction project. True, the large earthquake could happen and the effect would be
devastating, but there is generally little to be gained from including the rare event in an overview
model.
The expected impact of a rare event is determined by two factors: the probability that it will occur
and, if it did occur, the distribution of possible impact it would have. For example, we may determine
that there is about a 1:50 000 chance of a very large earthquake during the construction of a skyscraper.
However, if there were an earthquake, it would inflict anything between a few hundred pounds damage
and a few million.
In general, the distribution of the impact of a rare event is far more straightforward to determine than
the probability that the rare event will occur in the first place. We often can be no more precise about
the probability than to within one or two orders of magnitude (i.e. to within a factor of 10-100). It is
usually this determination of the probability of the event that provides a stumbling block for the analyst.
One method to determine the probability is to look at past frequencies and assume that they will
represent the future. This may be of use if we are able to collect a sufficiently large and reliable dataset.
Earthquake data in the New World, for example, only extends for 200 or 300 years and could give us,
at its smallest, a one in 200 year probability.
Another method, commonly used in fields like nuclear power reliability, is to break the problem
down into components. For an explosion to occur in a nuclear power station (excluding human error),
a potential hazard would have to occur and a string of safety devices would all have to fail together.
The probability of an explosion is the product of the probability of the initial conditions necessary for
an explosion and the probabilities of each safety device failing. This method has also been applied in
epidemiology where agricultural authorities have sought to determine the risk of introduction of an exotic
disease. These analyses typically attempt to map out the various routes through which contaminated
animals or animal products can enter the country and then infect the country's livestock. In some cases,
the structure of the problem is relatively simple and the probabilities can be reasonably calculated; for
example, the risk of introducing a disease through importing semen straws or embryos. In this case
the volume is easily estimated, its source determinable and regulations can be imposed to minimise
the risk.
In other cases, the structure of the problem is extremely complex and a sensible analysis may be
impossible except to place an upper limit on the probability; for example, the risk of introducing disease
into native fish by importing salmon. There are so many paths through which a fish in a stream or fish
farm could be exposed to imported contaminated salmon, ranging from a seagull picking up a scrap
from a dump and dropping it in a stream right in front of a fish to a saboteur deliberately buying some
salmon and feeding it to fish in a farm. It is clearly impossible to cover all of the scenarios that might
exist, or even to calculate the probability of each individual scenario. In such cases, it makes more sense
to set an upper bound to the probability that infection occurs.
It is very common for people to include rare events in a risk analysis model that is primarily concerned
with the general uncertainty of the problem, but provides little extra insight. For example, we might
construct a model to estimate how long it will take to develop a software application for a client:
designing, coding, testing, etc. The model would be broken down into key tasks and probabilistic
estimates made for the duration of each task. We would then run a simulation to find the total effect
of all these uncertainties. We would not include in such an analysis the effect of a plane crashing into
the office or the project manager quitting. We might recognise these risks and hold back-up files at
a separate location or make the project manager sign a tight contract, but we would gain no greater
understanding of our project's chance of meeting the deadline by incorporating such risks into our
model.

Chapter 4 Choice of model structure

65

4.5.2 Model uncertainty
Model building is subjective. The analyst has to decide the way he will build a necessarily simple model
to attempt to represent a frequently very complicated reality. One needs to make decisions about which
bits can be left out as insignificant, perhaps without a great deal of data to back up the decision. We also
have to reason which type of stochastic process is actually operating. In truth, we rarely have a purely
binomial, Poisson or any other theoretical stochastic process occurring in nature. However, we can often
convince ourselves that the degree of deviation from the simplified model we chose to use is not terribly
significant. It is important in any model to consider how it could fail to represent the real world. In any
mathematical abstraction we are malung certain assumptions, and it is important to run through these
assumptions, both the explicit assumptions that are easy to identify and the implicit assumptions that one
may easily fail to spot. For example, using a Poisson process to model frequencies of epidemics may
seem quite reasonable, as they could be considered to occur randomly in time. However, the individuals
in one epidemic can be the source of the next epidemic, in which case the events are not independent.
Seasonality of epidemics means that the Poisson intensity varies with month, which can be catered for
once it is recognised, but if there are other random elements affecting the Poisson intensity then it may
be more appropriate to model the epidemics as a mixture process.
Sometimes one may have two possible models (for example, two equations relating bacteria growth
rates to time and ambient temperature, or two equations for the lifetime of a device), both of which
seem plausible. In my view, these represent subjective uncertainty that should be included in the model,
just as other uncertain parameters have distributions assigned to them. So, for example, if I have two
plausible growth models, I might use a discrete distribution to use one or the other randomly during
each iteration of the model.
There is no easy solution to the problems of model uncertainty. It is essential to identify the simplifications and assumptions one is making when presenting the model and its results, in order for the
reader to have an appropriate level of confidence in the model. Arguments and counterarguments can
be presented for the factors that would bring about a failure of the model. Analysts can be nervous
about pointing out these assumptions, but practical decision-makers will understand that any model has
assumptions and they would rather be aware of them than not. In any case, I think it is always much
better for me to be the person who points out the potential weaknesses of my models first. One can also
often analyse the effects of changing the model assumptions, which gives the reader some feel for the
reliability of the model's results.

Chapter 5

Understanding and using the results
of a risk analysis
A risk analysis model, however carefully crafted, is of no value unless its results are understandable,
useful, believable and tailored to the problem in hand. This chapter looks at various techniques to help
the analyst achieve these goals.
Section 5.1 gives a brief overview of the points that should be borne in mind in the preparation of
a risk analysis report. Section 5.2 looks at how to present the assumptions of the model in a succinct
and comprehensible way. The results of a risk analysis model are far more likely to be accepted by
decision-makers if they understand the model and accept its assumptions.
Section 5.3 illustrates a number of graphical presentations that can be employed to demonstrate a
model's results and offers guidance for their most appropriate use. Finally, Section 5.4 looks at a
variety of statistical analyses that can be performed on the output data of a risk analysis.
In addition to writing comprehensive risk analysis reports, I have found it particularly helpful to my
clients to run short courses for senior management that explain:
how to manage a risk assessment (time and resources required, typical sequence of activities, etc.);
how to ensure that a risk assessment is being performed properly;
what a risk assessment can and cannot do;
what outputs one can ask for;
how to interpret, present and communicate a risk assessment and its results.
This type of training eases the introduction of risk analysis into an organisation. We see many organisations where the engineers, analysts, scientists, etc., have embraced risk analysis, trained themselves
and acquired the right tools and then fail to push the extra knowledge up the decision chain because
the decision-makers remain unfamiliar and perhaps intimidated by all this new "risk analysis stuff'. If
you are intending to present the results of a risk analysis to an unknown audience, consider assuming
that the audience knows nothing about risk analysis modelling and explain some basic concepts (like
Monte Carlo simulation) at the beginning of the presentation.

5.1 Writing a Risk Analysis Report
Complex models, probability distributions and statistics often leave the reader of a risk analysis report
confused (and probably bored). The reader may have little understanding of the methods employed
in risk analysis or of how to interpret, and make decisions from, its results. In this environment it is
essential that a risk analysis report guide the reader through the assumptions, results and conclusions (if
any) in a manner that is transparently clear but neither esoteric nor oversimplistic.

68

R~skAnalysis

The model's assumptions should always be presented in the report, even if only in a very shorthand
form. I have found that a report puts across its message to the reader much more effectively if these
model assumptions are put to the back of the report, the front being reserved for the model's results,
an assessment of its robustness (see Chapter 3) and any conclusions. We tend to write reports with the
following components (depending on the situation):
summary;
a introduction to problem;
a decision questions addressed and those not addressed;
a discussion of available data and relation to model choice;
a major model assumptions and the impact on the results if incorrect;
a critique of model, comment on validation;
a presentation of results;
a discussion of possible options for improvement, extra data that would change the model or its results,
additional work that could be done;
a discussion of modelling strategy;
a decision question(s);
a available data;
a methods of addressing decision questions with available information;
a assumptions inherent in different modelling options;
a explanation of choice of model;
a discussion of model used;
a overview of model structure, how the sections relate together;
a discussion of each section (data, mathematics, assumptions, partial results);
a results (graphical and statistical analyses);
a model validation;
a references and datasets;
a technical appendices;
a explanation of unusual equation derivations;
a guide on how to interpret and use statistical and graphical outputs.
a

II

The results of the model must be presented in a form that clearly answers the questions that the
analyst sets out to answer. It sounds rather obvious, but I have seen many reports that have failed in
this respect for several reasons:
a
a

a

The report relied purely on statistics. Graphs help the reader enormously to get a "feel" for the
uncertainty that the model is demonstrating.
The key question is never answered. The reader is left instead to make the last logical step. For
example, a distribution of a project's estimated cost is produced but no guidance is offered for
determining a budget, risk contingency or margin.
The graphs and statistics use values to five, six or more significant figures. This is an unnatural way
for most readers to think of values and impairs their ability to use the results.

Chapter 5 Understanding and using the results of a risk analysis

69

The report is filled with volumes of meaningless statistics. Risk analysis software programs, like
@RISK and Crystal Ball, automatically generate very comprehensive statistics reports. However,
most of the statistics they produce will be of no relevance to any one particular model. The analyst
should pare down any statistics report to those few statistics that are germane to the problem being
modelled.
The graphs are not properly labelled! Arrows and notes on a graph can be particularly useful.
In summary:
1. Tailor the report to the audience and the problem.

I

i

i

I

2. Keep statistics to a minimum.
3. Use graphs wherever appropriate.
4. Always include an explanation of the model's assumptions.

5.2 Explaining a Model's Assumptions
We recommend that you are very explicit about your assumptions, and make a summary of them in a
prominent place in the report, rather than just have them scattered through the report in the explanation
of each model component.
A risk analysis model will often have a fairly complex structure, and the analyst needs to find ways of
explaining the model that can quickly be checked. The first step is usually to draw up a schematic diagram
of the structure of the model. The type of schematic diagram will obviously depend on the problem
being modelled: GANTT charts, site plans with phases, work breakdown structure, flow diagrams, event
trees, etc. - any pictorial representation that conveys the required information.
The next step is to show the key quantitative assumptions that are made for the model's variables.
Distribution parameters

Using the parameters of a distribution to explain how a model variable has been characterised will often
be the most informative when explaining a model's logic. We tend to use tables of formulae for more
technical models where there are a lot of parametric distributions and probability equations, because the
logic is apparent from the relationship between a distribution's parameters and other variables. For nonparametric distribution~,which are generally used to model expert opinion, or to represent a dataset, a
thumbnail sketch helps the reader most. Influence diagram plots (Figure 5.1 illustrates a simple example)
are excellent for showing the flow of the logic and interrelationships between model components, but
not the mathematics underlying the links.
Graphical illustrations of quantitative assumptions are particularly useful when non-parametric distributions have been used. For example, a sketch of a VoseRelative (Custom in Crystal Ball, General in
@RISK), a VoseHistogram or a VoseCumulA distribution will be a lot more informative than noting
its parameters values. Sketches are also very good when you want to explain partial model results. For
example, summary plots are useful for demonstrating the numbers that come out of what might be a
quite complex time series model. Scatter plots are useful for giving an overview of what might be a
very complicated correlation structure between two or more variables.
Figure 5.2 illustrates a simple format for an assumptions report. Crystal Ball offers a report-writing
feature that will do most of this automatically. There will usually be a wealth of data behind these
key quantitative assumptions and the formulae that have been used to link them. Explanations of the

70

Risk Analysis

Total project cost
\

I

I

A

Additional
costs
\

I

Inflation

A

'

r

Risk of political
change

.'
Figure 5.1

J

Risk of strike

\

/

Risk of bad
weather

.'

/

Example of a schematic diagram of a model's structure.

data and how they translate into the quantitative assumptions can be relegated to an appendix of a risk
analysis report, if they are to be included at all.

5.3 Graphical Presentation of a Model's Results
There are two forms in which a model's results can be presented: graphs and numbers. Graphs have the
advantage of providing a quick, intuitive way to understand what is usually a fairly complex, numberintensive set of information. Numbers, on the other hand, give us the raw data and statistics from which
we can make quantitative decisions. This section looks at graphical presentations of results, and the
following section reviews statistical methods of reporting. The reader is strongly encouraged to use
graphs wherever it is useful to do so, and to avoid intensive use of statistics.

5.3.1 Histogram plots
The histogram, or relative frequency, plot is the most commonly used in risk analysis. It is produced
by grouping the data generated for a model's output into a number of bars or classes. The number
of values in any class is its frequency. The frequency divided by the total number of values gives an
approximate probability that the output variable will lie in that class's range. We can easily recognise
common distributions such as triangular, normal, uniform, etc., and we can see whether a variable is
skewed. Figure 5.3 shows the result of a simulation of 500 iterations, plotted into a 20-bar histogram.
The most common mistake in interpreting a histogram is to read off the y-scale value as the probability of the x value occurring. In fact, the probability of any x value, given the output is continuous (and
most are), is infinitely small. If the model's output is discrete, the histogram will show the probability
of each allowable x value, providing the class width is less than the distance between each allowable x
value. The number of classes used in a histogram plot will determine the scale of the y axis. Clearly,
the wider the bar width, the more chance there will be that values will fall within it. So, for example,
by doubling the number of histogram bars, the probability scale will approximately halve.
Monte Carlo add-ins generally offer two options for scaling the vertical axis: density and relative
frequency plots, shown in Figures 5.4 and 5.5.
In plotting a histogram, the number of bars should be chosen to balance between a lack of detail (too
few bars) and overwhelming random noise (too many bars). When the result of a risk analysis model

Year
lOthperc
Mean
90th perc

2009
37859
33803
29747

2010
42237
37949
33661

2011
43575
39399
35223

2012
40322
36690
33058

2013
39736
36388
33040

Production cost $/unit

2014
36725
33848
30971

0.35

0.4

2017
25064
23556
22048

2018
20085
19002
17919

172

174

2019
18460
17581
16702

Variable
Labour rate $/day
Advertising budget $Idyear
Admin costs $k/year
Transient market share
Commission rate

6000 0.3

2016
28507
26617
24727

2015
33312
30902
28492

0.45

.

.*

0.5

.*

5000 Plant costs $k
0
+ 4000 .-

3
c

3000 -

8

5

100

150

200

250

2000 1
0.66

0.7

0.72

0.74

0.76

Price charged per unit

0.78

0.8

72

R~skAnalysis

Figure 5.3 Doubling the number of bars on average halves the probability height for a bar.
Model outout

Value output

Model outuut

Value output

Figure 5.4 Histogram "density" plot. The vertical scale is calculated so that the sum of the histogram
bar areas equals unity. This is only appropriate for continuous outputs (left). Simulation software won't
recognise if an output is discrete (right), so treats the generated output data in the same way as a continuous
output. The result is a plot where the probability values make no intuitive sense - in the right-hand plot the
probabilities appear to add up to more than 1. To be able to tell the probability of the output being equal to
4, for example, we first need to know the width of the histogram bar.

is a discrete distribution, it is usually advisable to set the number of histogram bars to the maximum
possible, as this will reveal the discrete nature of the output unless the output distribution takes a large
number of discrete values.
Some risk analysis software programs offer the facility to smooth out a histogram plot. I don't recommend this approach because: (a) it suggests greater accuracy than actually exists; (b) it fits a spline
curve that will accentuate (unnecessarily) any peaks and troughs; and (c) if the scale remains the same,
the area does not integrate to equal 1 unless the original bandwidths were one x-axis unit wide.
The histogram plot is an excellent way of illustrating the distribution of a variable, but is of little
value for determining quantitative information about that variability, which is where the cumulative
frequency plot takes over.
Several histogram plots can be overlaid on each other if the histograms are not filled in. This allows
one to make a visual comparison, for example, between two decision options one may be considering.
The same type of graph can also be used to represent the results of a second-order risk analysis model

Chapter 5 Understanding and using the results of a risk analysis

Model output

Model output
0.2 1

0.08 T

I

73

Value output

Value output

Figure 5.5 Histogram "relative frequency" plot. The vertical scale is calculated as the fraction of the
generated values that fall into each histogram bar's range. Thus, the sum of the bar heights equals unity.
Relative frequency is only appropriate for discrete variables (right), where the histogram heights now sum to
unity. For continuous variables (left), the area under the curve no longer sums to unity.

where the uncertainty and variability have been separated, in which case each distribution curve would
represent the system variability given a random sample from the uncertainty distribution of the model.

5.3.2 The cumulative frequency plot
The cumulative frequency plot has two forms: ascending and descending, shown in Figure 5.6. The
ascending cumulative frequency plot is the most commonly used of the two and shows the probability
of being less than or equal to the x-axis value. The descending cumulative frequency plot, on the other
hand, shows the probability of being greater than or equal to the x-axis value. From now on, we shall
assume use of the ascending plot. Note that the mean of the distribution is sometimes marked on the
curve, in this case using a black square.
The cumulative frequency distribution of an output can be plotted directly from the generated data as
follows:
1. Rank the data in ascending order.

X

Figure 5.6 Ascending and descending cumulative frequency plots.

74

Risk Analysis

Figure 5.7

Producing a cumulative frequency plot from generated data points.

+

2. Next to each value, calculate its cumulative percentile P, = i/(n I), where i is the rank of that
data value and n is the total number of generated values. i/(n 1) is used because it is the best
estimate of the theoretical cumulative distribution function of the output that the data are attempting
to reproduce.
3. Plot the data ( x axis) against the i/(n 1) values (y axis). Figure 5.7 illustrates an example.
A total of 200-300 iterations are usually quite sufficient to plot a smooth curve. The above technique
is very useful if one wishes to avoid using the standard format that Monte Carlo software offer and if
one wishes to plot two or more cumulative frequency plots together.
The cumulative frequency plot is very useful for reading off quantitative information about the distribution of the variable. One can read off the probability of exceeding any value; for example, the probability
of going over budget, failing to meet a deadline or of achieving a positive NPV (net present value).
One can also find the probability of lying between any two x-axis values: it is simply the difference
between their cumulative probabilities. From Figure 5.8 we can see that the probability of lying between
1000 and 2000 is 89 % - 48 % = 41 %.
The cumulative frequency plot is often used in project planning to determine contract bid prices and
project budgets, as shown in Figure 5.9. The budget is set as the expected (mean) value of the variable
determined from the statistics report. A risk contingency is then added to the budget to bring it up
to a cumulative percentile that is comfortable for the organisation. The risk contingency is typically
the amount available to project managers to spend without recourse to their board. The (budget
contingency) value is set to match a cumulative probability that the board of directors is happy to plan
for: in this case 85 %. A more controlling board might set the sum at the 80th percentile or lower.
The margin is then added to the (budget contingency) to determine a bid price or project budget.
The project cost might still possibly exceed the bid price and the company would then make a loss.
Conversely, they would hope, by careful management of the project, to avoid using all of the risk

+

+

+

+

Chapter 5 Understanding and using the results of a risk analysis

75

Figure 5.8 Using the cumulative frequency plot to determine the probability of being between two values.

1

-

0.8 --

$

0.6 --

.--*

Cumulative distribution of cost of work

-

p

ti

-

.-

5
5
0

0.4

--

I

Risk

I

140

150

0.2 --

01
80

i

90

100

110

120

130
f 000s

160

170

180

Figure 5.9 Using the cumulative frequency plot to determine appropriate values for a project's budget,
contingency and margin.

contingency and actually increase their margin. The x axis of a cumulative distribution of project cost
or duration can be thought of roughly as listing risks in decreasing order of importance. The easiest
risks to manage, i.e. those that should be removed with good project management, are the first to erode
the total cost or duration. So a target set at the 80th percentile, sometimes called the 20 % risk level, is
roughly equivalent to removing the identified, easily managed risks. Then there are those risks that will
be removed with a lot of hard work, good management and some luck, which brings us down to the

76

R~skAnalysis

/

I

I

I
'

I

i
I
I
I
I

I
,

1

I

I

I

I

l

'

I'
:

-Milestone A

--- Milestone B
----- Milestone C
---.Milestone D
---- Milestone E

I

~ ~ 3 r k t c ~ i * : ' * r t : ~ * ~ n sq : ' ! s~ g ~ 8 : + r s r : 7 ' 8 * l

0

25

50

75

100
125
x (weeks)

150

175

200

Figure 5.10 Overlaying of the cumulative frequency plots of several project milestones illustrates any
increase in uncertainty with time.

50th percentile, or so. To reduce the actual cost or duration to somewhere around the 20th percentile
will usually require very hard work, good management and a lot of luck.
It is sometimes useful to overlay cumulative frequency plots together. One reason to do this is to get
a visual picture of stochastic dominance, described in Section 5.4.5. Another reason is to visualise the
increase (or perhaps decrease) in uncertainty as a project progresses. Figure 5.10 illustrates an example
for a project with five milestones. The time until completion of a milestone becomes progressively more
uncertain the further from the start the milestone is. Furthermore, the results of a second-order risk
analysis can be plotted as a number of overlying cumulative distributions, each curve representing a
distribution of variability for a particular random sample from the uncertainty distributions of the model.

5.3.3 Second-order cumulative probability plot
A second-order cdf is the best presentation of an output probability distribution when you run a secondorder Monte Carlo simulation. The second-order cdf is composed of many lines, each of which represents

Value

Figure 5.11 A second-order plot of a discrete random variable. The step nature of the plot makes it difficult
to read.

Chapter 5 Understanding and using the results of a risk analysis

0

1

2

3

4

5

6

7

8

9

1

77

0

Value

Figure 5.12 Another second-order plot of a discrete variable, where the probabilities are marked with small
points and joined by straight lines. The connection between the probability estimates is now clear, and the
uncertainty and randomness components can now be compared: at its widest the uncertainty contributes
a spread of about two units (dashed horizontal line), while the randomness ranges over some eight units
(filled horizontal line), so the inability to predict this variable is more driven by its randomness than by our
uncertainty in the model parameters.

0

2

4

6

8

10

12

14

16

18

20

Value

Figure 5.13 A second-order plot of a continuous variable where our inability to predict its value is equally
driven by uncertainty (dashed horizontal line) about the model parameters as by the randomness of the
system (filled horizontal line). This is a useful plot for decision-makers because it tells them potentially how
much more sure one would be of the predicted value if more information could be collected, and thus the
uncertainty reduced.

a distribution of possible variability or probability generated by picking a single value from each
uncertainty distribution in the model (Figures 5.1 1 to 5.13).

1

5.3.4 Overlaying of cdf plots

1

Several cumulative distribution plots can be overlaid together (Figure 5.14). The plots are easier to read
if the curves are formatted into line plots rather than area plots.

!
j

78

Risk Analysis

Cost $000

Figure 5.14

Cost $000

Several cumulative distribution plots overlaid together.

The overlaying of cumulative plots like this is an intuitive and easy way of comparing probabilities,
and is the basis of stochastic dominance tests. It is not very useful, however, for comparing the location,
spread and shape of two or more distributions, for which overlying density plots are much better.
We recommend that a complementary cumulative distribution plot be given alongside the histogram
(density) plot to provide the maximum information.

5.3.5 Plotting a variable with discrete and continuous elements
If a risk event does not occur, we could say it has zero impact, but if it occurs it will have an
uncertain impact. For example: a fire may have a 20 % chance of occurring and, if it does, will incur
$Lognorma1(120000, 30 000). We could model this as

or better still

Running a simulation with this variable as an output, we would get the uninformative, relative
frequency histogram plot (shown with different numbers of bars) in Figure 5.15.
There really is no useful way to show such a distribution as a histogram, because the spike at zero
(in this case) requires a relative frequency scale, while the continuous component requires a continuous
scale. A cumulative distribution, however, would produce the plot in Figure 5.16, which is meaningful.

5.3.6 Relationship between cdf and density (histogram) plots
For a continuous variable, the gradient of a cdf plot is equal to the probability density at that value.
That means that, the steeper the slope of a cdf, the higher a relative frequency (histogram) plot would
look at that point (Figure 5.17).
The disadvantage of a cdf is that one cannot readily determine the central location or shape of the
distribution. We cannot even easily recognise common distributions such as triangular, normal and
uniform without practice in cdf form. Looking at the plots in Figure 5.18, you will readily identify the
distribution form from the left panels, but not so easily from the right panels.

Chapter 5 Understanding and using the results of a risk analysis

200 bars

79

100 bars
0.900
0.800
0.700
0.600
0.500
0.400
0.300
0.200
0.100

0.000

I

0

50

4

100

150

200

0.000

-

-

I

I

0

50

100

Cost (thousands)

Cost (thousands)
40 bars

40 bars

0.900

0.900

T

0.800

0.500 0.400 0.300 0.200 0.100 0.700
0.600

0.000

1

I

50

100

150

200

Cost (thousands)
Figure 5.15

Cost (thousands)

Histogram plot of a risk event.

Cost (thousands)
Figure 5.16 Cumulative distribution of a risk event.

150

200

80

Risk Analysis

I
I
1

Cost $000

1
I
t

-*-'-

I
I

I
1

i

i

?/
/
4'

,
/

Figure 5.17 Relationship between density and cumulative probability curves.

For a discrete distribution, the cdf increases in steps equal to the probability of the x value occurring
(Figure 5.19).

5.3.7 Crude sensitivity analysis and tornado charts
Most Monte Carlo add-ins can perform a crude sensitivity analysis that is often used to identify the
key input variables, as a precursor to performing a tornado chart or similar, more advanced, analysis on
these key variables. It achieves this by performing one of two statistical analyses on data that have been
generated from input distributions and data calculated for the selected output. Built into this operation
are two important assumptions:
1. All the tested input parameters have either a purely positive or negative statistical correlation with

the output.
2. Each uncertain variable is modelled with a single distribution.

Chapter 5 Understanding and using the results of a risk analysis

Figure 5.18

81

Density and cumulative plots for some easily recognised distributions.

Figure 5.19 Relationship between probability mass and cumulative probability plots for a discrete
distribution.

82

Risk Analysis

Input

Figure 5.20

Input

Input

Example input-output relationships for which crude sensitivity analysis is inappropriate.

Assumption 1 is rarely invalid, but would be incorrect if the output value were at a maximum or
minimum for an input value somewhere in the middle of its range (see, for example, Figure 5.20).
Assumption 2 is very often incorrect. For example, the impact of a risk event might be modelled as

Monte Carlo software will generate the Bernoulli (or equivalently, the binomial) and triangular distributions independently. Performing the standard sensitivity analysis will evaluate the effect of the
Bernoulli and the triangular distributions separately, so the measured effect on the output will be divided
between these two distributions. ModelRisk gets round this by providing the function VoseRiskEvent,
for example:

The function constructs a single distribution so only one Uniform(0, 1) variate is being used to drive
the sampling of the risk impact. If you use @RISK, you can write

and @RISK will then drive the sampling for that risk event so the @RISK built-in sensitivity analysis
will now work correctly.
Similarly, if you were an insurance company you might be interested in the impact on your corporate
cashflow of the aggregate claims distribution for some particular policy. ModelRisk offers a number
of aggregate distribution functions that internally calculate the aggregation of claim size and frequency
distributions. So, for example, one can write

which will return the aggregate cost of Poisson(5500) claims each drawn independently from a Lognormal(2350, 1285) distribution, and the generated aggregate cost value will be controlled by the U variate.
ModelRisk has many such tools for simulating from constructed distributions to help you perform a
correct sensitivity analysis.
Assumption 2 also means that this method of sensitivity analysis is invalid for a variable that is
modelled over a series of cells, like a time series of exchange rates or sales volumes. The automated analysis will evaluate the sensitivity of the output to each distribution within the time series

Chapter 5 Understanding and using the results of a risk analysis

83

separately. You can still evaluate the sensitivity of a time series by running two simulations: one
with all the distributions simulating random values, the other with the distributions of the time series
locked to their expected value. If the distributions vary significantly, the variable time series is important.
Two statistical analyses

Tornado charts for two different methods of sensitivity analysis are in common use. Both methods plot
the variable against a statistic that takes values from -1 (the output is wholly dependent on this input,
but when the input is large, the output is small), through 0 (no influence) to +1 (the output is wholly
dependent on this input, and when the input is large, the output is also large):
a

Stepwise least-squares regression between collected input distribution values and the selected output
values. The assumption here is that there is a relationship between each input I and the output 0
(when all other inputs are held constant) of the form 0 = I * m c, where m and c are constants.
That assumption is correct for additive and subtractive models, and will give very accurate results
in those circumstances, but is otherwise less reliable and somewhat unpredictable. The r-squared
statistic is then used as the measure of sensitivity in a tornado chart.
Rank order correlation. This analysis replaces each collected value by its rank among other values
generated for that input or output, and then calculates the Spearman's rank order correlation coefficient r between each input and the output. Since this is a non-parametric analysis, it is considerably
more robust than the regression analysis option where there are complex relationships between the
inputs and output.

+

Tornado charts are used to show the influence an input distribution has on the change in value of
the output (Figure 5.21). They are also useful to check that the model is behaving as you expect. Each
input distribution is represented by a bar, and the horizontal range the bars cover give some measure of
the input distribution's influence on the selected model output. Their main use is as a quick overview
to identify the most influential input model parameters. Once these parameters are determined, other
sensitivity analysis methods like spider plots and scatter plots are more effective.

Figure 5.21

Profit sensitivity

Profit sensitivity

Sensitivity

Profit variation

Examples of tornado charts.

~

!
i

The left-hand plot of Figure 5.21 is the crudest type of sensitivity analysis, where some statistical
measure of the statistical correlation is calculated between the input and output values. The logic is that,
the higher the degree of correlation between the input and output variables, the more the input variable
is affecting the output. The degree of correlation can be calculated using either rank order correlation
or stepwise least-squares regression. My preference is to use rank order correlation because it makes
no assumption about the form of relationship between the input and the output, beyond the assumption
that the direction of the relationship is the same across the entire input parameter's range. Least-squares
regression, on the other hand, assumes that there would be a straight-line relationship between the input
and the output variables. If the model is a sum of costs or task durations, or some other purely additive
model, this assumption is fine. However, divisions and power functions in a model will strongly violate
such an assumption. Be careful with this simple type of sensitivity because input-output relationships
that strongly deviate from a continuously increasing or decreasing trend can be completely missed. The
x-axis scale is a correlation statistic so is not very intuitive because it does not relate to the impact on
the output in terms of the output's units. Moreover, rank order correlation can be deceptive. Consider
the following simple model:

C = Normal(1, 3)

D(output) = A

+B +C

Running a simulation model gives the following levels of correlation:

Clearly from the model structure we can see that variable A is actually driving most of the output
uncertainty. If we set the standard deviation of each variable to zero in turn and compare the drop in
standard deviation of the output (a good measure of variation in this case because we are just adding
normal distributions), then
A : drops output standard deviation by 85.1562 %

B : drops output standard deviation by 0.0004 %
C : drops output standard deviation by 1.1037 %

which tells an entirely different story from the regression and correlation statistics. The reason for this is
that variable B is being driven by A, so the influence of A is being divided essentially equally between
A and B. A proper regression analysis would require us to build in the direction of influence from A to
B, and then the influence of B would come out as insignificant, but to do so we would have to specify
that relationship - a very difficult thing to do in a complex spreadsheet model.

i

Chapter 5 Understanding and using the results of a risk analysis

85

The right-hand plot of Figure 5.21 is a little more robust and is typically created by fixing an input
distribution to a low value (say its 5th percentile), running a simulation, recording the output mean and
then repeating the process with a medium value (say the 5oth percentile) and a high value (say the 95th
percentile) of the input distribution: these output means define the extremes of the bars. This type of
plot is a cut-down version of a spider plot. It is a little more robust, and the x-axis scale is in units of
the output so is more intuitive.
At low levels of correlation you will often see a variable with correlations of the opposite sign to
what you would expect. This is particularly so for rank order correlation. It just means that the level
of correlation is so low that a spurious correlation of generated values will occur. For presentation
purposes, it will obviously be better to remove these bars.
It is standard practice to plot the variables from the top down in decreasing size of correlation.
If there are positive and negative correlations, the result looks a bit like a tornado, hence the name.
It is sensible, of course, to limit the number of variables that are shown on the plot. I usually limit
the plot to those variables that have a correlation of at least a quarter of the maximum observed
correlation, or at least down to the first correlation that has the opposite sign to what one would
logically have expected. This usually means that below such levels of correlation the relationships
are statistically insignificant, although of course one can make a mistake in reasoning the sense of a
correlation.
The tornado chart is useful for identifying the key variables and uncertain parameters that are driving
the result of the model. It makes sense that, if the uncertainty of these key parameters can be reduced
through improved knowledge, or the variability of the problem can be reduced by changing the system,
the total uncertainty of the problem will be reduced too. The tornado chart is therefore very useful for
planning any strategy for the reduction of total uncertainty. The key model components can often be
made more certain by:
Collecting more information on the parameter if it has some level of uncertainty.
Determining strategies to reduce the effect of the variability of the model component. For a project
schedule, this might be altering the project plan to take the task off the critical path. For a project
cost, this might be offloading the uncertainty via a fixed-price subcontract. For a model of the
reliability of a system, this might be increasing the scheduled number of checks or installing some
parallel redundancy.
The rank order correlation between the model components and its output can be easily calculated
if the uncertainty and variability components are all simulated together, because the simulation software will have all the values generated for the input distributions and the output together in the one
database. It may sometimes be useful to show in a tornado chart that certain model components are
uncertain and others are variable by using, for example, white bars for uncertainty and black bars for
variability.

5.3.8 More advanced sensitivity analysis with spider plots
To construct a spider plot we proceed as follows:
Before starting, set the number of iterations to a fairly low value (e.g. 300).
Determine the input distributions to analyse (performing a crude sensitivity analysis will guide you).

86

a

a

Risk Analysis

Determine the cumulative probabilities you wish to test (we generally use 1 %, 5 %, 25 %, 50 %,
75 %, 95 %, 99 %).
Determine the output statistic you wish to measure (mean, a particular percentile, etc.).

Then:
a
a
a

a
a

Select an input distribution.
Replace the distribution with one of the percentiles you specified.
Run a simulation and record the statistic of the output.
Select the next cumulative percentile and run another simulation.
Repeat until all percentiles have been run for this input, then put back the distribution and move on
to the next selected input.

Once all inputs have been treated this way, we can produce the spider plot shown in Figure 5.22.
This type of plot usually has several horizontal lines for variables that have almost no influence on
the output. It makes the graph a lot clearer to delete these (Figure 5.23).
Now we can very clearly see how the output mean is influenced by each input. The vertical range
produced by the oil price line shows the range of expected profits there would be if the oil price were
fixed somewhere between the minimum and maximum (a range of $180 million). The next largest range
is for the gas price ($llOmillion), etc. The analysis helps us understand the degree of sensitivity in
terms decision-makers understand as opposed to correlation or regression coefficients. The plot will also
allow us to see variables that have unusual relationships, e.g. a variable that has no influence except at
its extremes, or some sort of U-shaped relationship that would be missed in a correlation analysis.
Mean of Profit vs Input Distribution Percentile

Thickness

Exchange rate

Oil price

1.60E+08

I

I

I

I

I

1

0%

20%

40%

60%

80%

100%

Percentile

Figure 5.22 Spider plot example.

Chapter 5 Understanding and using the results of a risk analysis

87

Mean of Profit vs Input Distribution Percentile

Exchange rate

Oil price

1.60E+08

(

I

I

I

I

I

0%

20%

40%

60%

80%

100%

Percentile

Figure 5.23 Spider plot example with inconsequential variables removed.

5.3.9 More advanced sensitivity analysis with scatter plots
By plotting the generated values for an input against the output corresponding values for each model
iteration in a scatter plot one can get perhaps the best understanding of the effect of the input on the
output value. Plotting generated values for two outputs is also commonly done; for example, plotting
a project's duration against its total cost. Scatter plots are easy to produce by exporting the simulation
data at the end of a simulation into Excel.
It takes a little effort to generate these scatter plots, so we recommend that you perform a rough
sensitivity analysis to help you determine which of a model's input distributions are most affecting the
output first.
Figure 5.24 shows 3000 points, which is enough to get across any relationship but not too many to
block out central areas if you use small circular markers. The chart tells the story that the model predicts
increasing advertising expenditure will increase sales - up to a point. Since this is an Excel plot we
can add a few useful refinements. For example, we could show scenarios above and below a certain
advertising budget (Figure 5.25).
We could also perform some statistical analysis of the two subsets, like a regression analysis
(Figure 5.26 shows how in an Excel chart).
The equations of the fitted lines show that you are getting about 3 times more return for your advertising dollar below $150k than above (0.034810.0132 2.6). It is also possible, though mindbogglingly
tedious, to plot scatter plot matrices in Excel to show the interrelationship of several variables. Much
better is to export the generated values to a statistical package like SPSS. At the time of writing (2007),
planned versions of @RISK and Crystal Ball will also do this.

88

Risk Analysis

Advertising expenditure $k

Figure 5.24

Example scatter plot.

Advertising expenditure $k

Figure 5.25

Scattef plot separating scenarios where expenditure was above or below $150k.

5.3. I 0 Trend plots
If a model includes a time series forecast or other type of trend, it is useful to be able to picture the
genera1 behaviour of the trend. A trend or summary plot provides this information. Figure 5.27 illustrates
an example using the mean and 5th, 20th, 80th and 95th percentiles. Trend plots can be plotted using
cumulative percentiles as shown here, or with the mean Z!Z one and two standard deviations, etc. I
recommend that you avoid using standard deviations, unless they are of particular interest for some
technical reason, because a spread of, say, one standard deviation around the mean will encompass a

Chapter 5 Understandingand using the results of a risk analysis

89

Advertising expenditure $k

Figure 5.26 Scatter plot with separate regression analysis for scenarios above or below $150k.
Market size predictions
45000

-E
-W

40000
35000

-+
95 percentile

- - -0-- -80 percentile

-

Mean

- - 0-- - 20 percentile

'5 30000

#

C

Z

25000
20000
15000

Figure 5.27 A trend or summary plot.

varying percentage of the distribution depending on its form. That means that there is no consistent
probability interpretation attached to mean f k standard deviations.
The trend plot is useful for reviewing a trending model to ensure that seasonality and any other patterns
are being reproduced. One can also see at a glance whether nonsensical values are being produced; a
forecasting series can be fairly tricky to model, as described in Chapter 12, so this is a nice reality check.
An alternative to the trend plot above is a Tukey or box plot (Figure 5.28).
A Tukey plot is more commonly used to represent variations between datasets, but it does have
the possibility of including more information than trend plots. A word of caution: the minimum and
maximum generated values from a simulation can vary enormously between simulations with different
random number seeds, which means they are not usually values to be relied upon. Plotting the maximum
value of an inflation model going out 15 years, for example, might produce a very large value if you
ran it for many iterations and dominate the graph scaling.

90

Risk Analysis

Sales
35.00

-

Mean
50th percentile

Figure 5.28 A Tukey or box plot. Box contains 25-75 percentile range.

5.3.1 1 Risk-return plots
Risk-return (or cost-benefit) plots are one way graphically to compare several decision options on the
same plot. The expected return in some appropriate measure is plotted on the vertical axis versus the
expected cost in some measure on the horizontal axis (Figure 5.29).
The plot should be tailored to the decision question, and it may be useful to plot two or more such
plots to show different aspects.
Examples of measures of return (benefit) are as follows:
the probability of making a profit;
the income or expected return;
the number of animals that could be imported for a given level of risk (if one were looking at various
border control options for disease control, say);
the number of extra votes that would be gained in an election campaign;
the time that would be saved;
the reduction in the number of complaints received by a utility company;
the extra life expectancy of a kidney transplant patient.
Examples of measures of risk (cost) are as follows:
the amount of capital invested;
the probability of exceeding a schedule deadline;
the probability of financial loss;
the conditional mean loss;
the standard deviation or variance of profit or cashflow;
the probability of introduction of a disease;
the semi-standard deviation of loss;

Chapter 5 Understanding and using the results of a risk analysis

I

Risk (cost)

91

I

Figure 5.29 Example risk-return plot.

the number of employees that would be made redundant;
the increased number of fatalities;
the level of chemical emission into the environment.

5.4 Statistical Methods of Analysing Results
Monte Carlo add-ins offer a number of statistical descriptions to help analyse and compare results.
There are also a number of other statistical measures that you may find useful. I have categorised the
statistical measures into three groups:

1. Measures of location - where the distribution is "centered".
2. Measures of spread - how broad the distribution is.
3. Measures of shape - how lopsided or peaked the distribution is.
In general, at Vose Consulting we use very few statistical measures in writing our reports. The
following statistics are easy to understand and, for nearly any problem, communicate all the information
one needs to get across:

the mean which tells you where the distribution is located and has some important properties for
comparing and combining risks;

92

Risk Analysis

cumulative percentiles which give the probability statements that decision-makers need (like the
probability of being above or below X or between X and Y);
relative measures of spread: normalised standard deviation (occasionally) for comparing the level
of uncertainty of different options relative to their size (i.e. as a dimensionless measure) where the
outputs are roughly normal, and normalised interpercentile range (more commonly) for the same
purpose where the outputs being compared are not all normal.

5.4.1 Measures of location
There are essentially three measures of central tendency (i.e. measures of the central location of a
distribution) that are commonly provided in statistics reports: the mode, the median and the mean.
These are described below, along with the conditional mean, which the reader may find more useful in
certain circumstances.
Mode

The mode is the output value that is most likely to occur (Figure 5.30).
For a discrete output, this is the value with the greatest observed frequency. For a continuous distribution output, the mode is determined by the point at which the gradient of the cumulative distribution
of the model output generated values is at its maximum.
The estimate of the mode is quite imprecise if a risk analysis output is continuous or if it is discrete
and the two (or more) most likely values have similar probabilities (Figure 5.31). In fact the mode is
of no practical value in the assessment of most risk analysis results, and, as it is difficult to determine
precisely, it should generally be ignored.
Median ~ 5 0

The median is the value above and below which the model output has generated equal numbers of data,
i.e. the 5oth percentile. This is simply another cumulative percentile and, in most cases, has no particular
benefits over any other percentile.

C h a ~ t e5r Understandinr! and usine the results of a risk analysis

93

Figure 5.31 A discrete distribution with two modes, or no mode, depending on how you look at it.

Mean 2

This is the average of all the generated output values. It has less immediate intuitive appeal than the
mode or median but it does have far more value. One can think of the mean of the output distribution
as the x-axis point of balance of the histogram plot of the distribution. The mean is also known as
the expected value, although I don't recommend the term as it implies for most people the most likely
value. Sometimes also known as the first moment about the origin, it is the most useful statistic in
It is particularly useful for the
risk analysis. The mean of a dataset {xi}is often given the notation
following two reasons:

x.

(a

+ b) = ii + b and therefore (a - b) = Zi - b

where a and b are two stochastic variables. In other words: (1) the mean of the sum is the sum of their
means; (2) the mean of their product is the product of their means. These two results are very useful if
one wishes to combine risk analysis results or look at the difference between them.
Conditional mean

The conditional mean is used when one is only interested in the expected outcome of a portion of the
output distribution; for example, the expected loss that would occur should the project fail to make a
profit. The conditional mean is found by calculating the average of only those data points that fall into
the scenario in question. In the example of expected loss, it would be found by taking the average of
all the profit output's data points that were negative.
The conditional mean is sometimes accompanied with the probability of the output falling within the
required range. In the loss example, it would be the probability of producing a negative profit.
Relative pos~tionsof the mode, med~anand mean

For any unimodal (single-mode) distribution that is positively skewed (i.e. has a longer right tail than
left tail), the mode, median and mean fall in that order (Figure 5.32).

94

Risk Analysis

Median

Figure 5.32 Relative positions of the mode, median and mean of a univariate distribution.

{I
$

1

I<

If the distribution has a longer left tail than right, the order is reversed. Of course, if the distribution
is symmetric and unimodal, like the normal or Student distributions, the mode, median and mean will
be equal.

5.4.2 Measures of spread
,

The three measures of spread commonly provided in statistics reports are the standard deviation a , the
variance V and the range. There are several other measures of spread, discussed below, that the reader
may also find useful under certain circumstances.

Variance is calculated on generated values as follows:

i.e. it is essentially the average of the squared distance of all generated values from their mean. The
larger the variance, the greater is the spread. The variance is called the second moment (because of its
square term) about the mean and has units that are the square of the variable. So, if the output is in &,
the variance is measured in g2, making it difficult to have any intuitive feel for the statistic.
Since the distance between the mean and each generated value is squared, the variance is far more
sensitive to the data points that make up the tails of the distribution. For example, a data point that was
three units from the mean would contribute 9 times as much (32 = 9) to the variance as a data point

C h a ~ t e 5r Undentanding and using the results of a risk analysis

95

that was only one unit from the mean (12 = 1). The variance is useful if one wishes to determine the
spread of the sum of several uncorrelated variables X, Y as it follows these rules:
V(X

+ Y) = V(X) + V(Y)

V(X - Y) = V(X)

+ V(Y)

V(nX) = n2v(x), where n is some constant

These formulae also provide us with a guideline of how uniformly to disaggregate an additive model
so that each component provides a roughly equal contribution to the total output uncertainty. If the
model sums a number of variables, the contribution of each variable to the output uncertainty will be
approximately equal if each variable has about the same variance.
Standard deviation s

Standard deviation is calculated as the square root of the variance. In other words:

It has the advantage over the variance that it is in the same units as the output to which it refers.
However, it is still summing the squares of the distances of each generated value from the mean and
is therefore far more sensitive to outlying data points that make up the tails of the distribution than to
those that are close to the mean.
The standard deviation is frequently used in connection with the normal distribution. Results in risk
analysis are often quoted using the output's mean and standard deviation, implicitly assuming that the
output is normally distributed, and therefore:

+
+

the range Z - s to T s contains 68 % or so of the distribution;
the range Z - 2s to Z 2s contains 95 % or so of the distribution.
Some care should be exercised here. The distribution of a risk analysis output is often quite skewed
and these assumptions do not then follow at all. However, Tchebysheff's rule provides some weak
interpretation of the fraction of a distribution contained within k standard deviations.
Range

The range of an output is the difference between the maximum and minimum generated values. In
most cases this is not a very useful measure as it is obviously only sensitive to the two extreme values
(which are, after all, randomly generated and could often take a wide range of legitimate values for any
particular model).

96

Risk Analysis

Mean deviation (MD)

The mean deviation is calculated as

i.e. the average of the absolute differences between the data points and their mean. This can be thought
of as the expected distance that the variable will actually be from the mean. The mean deviation offers
two potential advantages over the other measures of spread: it has the same units as the output and
gives equal weighting to all generated data points.
Semi-variance V, and Semi-standard deviation s,

Variance and standard deviation are often used as measures of risk in the financial sector because they
represent uncertainty. However, in a distribution of cashflow, a large positive tail (equivalent to the
chance of a large income) is not really a "risk", although this tail will contribute to, and often dominate,
the value of the calculated standard deviation and variance.
The semi-standard deviation and semi-variance compensate for this problem by considering only those
generated values below (or above, as required) a threshold, the threshold delineating those scenarios
that represent a "risk" and therefore should be included from those that are not a risk and therefore
should be excluded (Figure 5.33).
The semi-variance and semi-standard deviation are
k

S

-

C (xi - X O ) ~

i='

k

and s, =

fi

where xo is the specified threshold value and X I , . . . , xk are all of the data points that are either above
or below xo, as required.

Figure 5.33 The semi-standard deviation concept.

Chapter 5 Understanding and using the results of a risk analysis

97

Norrnalised standard deviation s,

This is the standard deviation divided by the mean, i.e.

It achieves two purposes:
1. The standard deviation is given as a fraction of its mean. Using this statistic allows the spread of
the distribution of a variable with a large mean and correspondingly large standard deviation to be
compared more appropriately with the spread of the distribution of another variable with a smaller
mean and a correspondingly smaller standard deviation.
2. The standard deviation is now independent of its units. So, for example, the relative variability of
the EUR:HKD and USD:GBP exchange rates can be compared.
The normalised interpercentile range works in the same way:
= (xB - X A ) / X ~where
~,
x~ > XA are percentiles like x95 and xo5 respectively

Interpercentile range

The interpercentile range of an output is calculated as the difference between two percentiles, for
example:
x95 - xo5, to give the central 90 % range;
~ 90 minimum, to give the lower 90 % range;
~ 90 xlo, to give the central 80 % range.
The interpercentile range is a stablemeasure of spread (unless one of the percentiles is the minimum
or maximum), meaning that the value is quickly obtained for relatively few iterations of a model. It
also has the great advantage of having a consistent interpretation between distributions.
One potential problem you should be aware of is with applying an interpercentile range calculation to
a discrete distribution, particularly when there are only a few important values, as shown in Figure 5.34.
In this example, several key cumulative percentiles fall on the same values, so of course several
different interpercentile ranges take the same values. In addition, the interpercentile range becomes very
sensitive to the percentile chosen. In the plot above, for example:

but

5.4.3 Measures of shape
Skewness 5

This is the degree to which the distribution is "lopsided". A positive skewness means a longer right tail;
a negative skewness means a longer left tail; zero skewness means the distribution is symmetric about
its mean (Figure 5.35).

98

Risk Analysis

-1

0

1

2

3

4

5

6

Output value

-1

0

1

2
3
Output value

4

5

6

Figure 5.34 Demonstration of how interpercentile ranges can be confusing with discrete distributions.

Figure 5.35

Skewness examples.

Chapter 5 Understanding and using the results of a risk analysis

99

The skewness S is calculated as

The a3 factor is put in to make the skewness a pure number, i.e. it has no units of measurement.
Skewness is also known as the third moment about the mean and is even more sensitive to the data
points in the tails of a distribution than the variance or standard deviation because of the cubed term.
It may be useful to note, for comparative purposes, that an exponential distribution has a skewness of
2.0, an extreme value distribution has a skewness of 1.14, a triangular distribution has a skewness of
between 0.562 and 0, and the skewness of a lognormal distribution goes from zero to infinity as its mean
approaches 0. Skewness has little practical purpose for most risk analysis work, although it is sometimes
used in conjunction with kurtosis (see below) to test whether the output distribution is approximately
normal. High skewness values from a simulation run are really quite unstable - if your simulation gives
a skewness value of 100, say, think of it as "really big" rather than taking its value as being usable.
Another measure of skewness, though rarely used, is the percentile skewness, S,, calculated as
S -

(90 percentile - 50 percentile)

- (50 percentile - 10 percentile)

It has the advantage over the standard skewness of being quite stable because it is not affected by the
values of the extreme data points. However, its scaling is different to standard skewness: if 0 < S, < 1
the distribution is negatively skewed; if Sp = 1 the distribution is symmetric; if S, > 1 the distribution
is positively skewed.
Kurtosis K

Kurtosis is a measure of the peakedness of a distribution. Like skewness statistics, it is not of much use
in general risk analysis. Kurtosis is calculated as

,
L

I

I

t1

I
1

In a similar manner to skewness, the a4 factor is put in to make the kurtosis a pure number. Kurtosis is
often known as the fourth moment about the mean and is even more sensitive to the values of the data
points in the tails of the distribution than the standard skewness statistic. Stable values for the kurtosis
of a risk analysis result therefore require many more iterations than for other statistics. High kurtosis
values from a simulation run are very unstable - if your simulation gives a kurtosis in the hundreds or
thousands, say, it means there is a big spike in the output and the simulation kurtosis is very dependent
on whether that spike was appropriately sampled, so for such large values just think of it as "really big".
Kurtosis is sometimes used in conjunction with the skewness statistics to determine whether an output
is approximately normally distributed. A normal distribution has a kurtosis of 3, so any output that looks
symmetric and bell-shaped and has a zero skewness and a kurtosis of 3 can probably be considered normal.
A uniform distribution has a kurtosis of 1.8, a triangular distribution has a kurtosis of 2.387, the
kurtosis of a lognormal distribution goes from 3.0 to infinity as its mean approaches 0 and an exponential

distribution has a kurtosis of 9.0. The kurtosis statistic is sometimes (in Excel, for example) calculated as

called the excess skewness, which can cause confusion, so be careful what statistic your software is
reporting.

5.4.4 Percentiles

I

Cumulative percentiles

These are values below which the specified percentage of the generated data for an output fall. Standard
notation is x p , where P is the cumulative percentage, e.g. X0.75 is the value that 75 % of the generated
data were less than or equal to.
The cumulative percentiles can be plotted together to form the cumulative frequency plot, the use of
which has been explained above.
Differences between cumulative percentiles are often used as a measure of the variable's range, e.g.
X0.95 - ~ 0 . 0 5would include the middle 90 % of the possible output values and ~ 0 . 8 0
- ~ 0 . 2 0would include
the middle 60 % of the possible values of the output; ~0.25,
~ 0 . 5 0and X0.75 are sometimes referred to as
the quartiles.
Relative percentiles

The relative percentiles are the fractions of the output data points that fall into each bar range of a
histogram plot. They are of little use in most risk analyses and are dependent upon the number of bars
that are used to plot the histogram.
Relative percentiles can be used to replicate the output distribution for inclusion in another risk analysis
model. For example, cashflow models may have been produced for a number of subsidiaries of a large
company. If an analyst wants to combine these uncertain cashflows into an aggregate model, he would
want distributions of the cashflow from each subsidiary. This is achieved by using histogram distributions
to model each subsidiary's cashflow and taking the required parameters (minimum, maximum, relative
percentiles) from the statistics report. Providing the cashflow distributions are independent, they can
then be summed in another model.

5.4.5 Stochastic dominance tests
Stochastic dominance tests are a statistical means of determining the superiority of one distribution
over another. There are several types (or degrees) of stochastic dominance. We have never found any
particular use for any but the first- and second-order tests described here. It would be a very rare problem
where the distributions of two options can be selected for no better reason than a very marginal ordering
provided by a statistical test. In the real world there are usually far more persuasive reasons to select
one option over another: option A would expose us to a greater chance of losing money than B; or
a greater maximum loss; or would cost more to implement; we feel more comfortable with option A
because we've done something similar before; option B will make us more strategically placed for the
future; option B is based on an analysis with fewer assumptions; etc.

Chapter 5 Understanding and using the resuk of a risk analysis

I

10 1

-

0.87 -

- - - - - Option A
-Opt~onB
0

20

40

60

60

100

120

140 160 180200

Figure 5.36 First-order stochastic dominance: FA < Fp,,SO option A dominates option B.

Fiw-order stochartic dominance

Consider options A and B having the distribution functions FA(x) and F s ( x ) , where it is desirable to
maximise the value of x .
If FA(x) 5 F R ( x )for all x , then option A dominates option B. That amounts to saying that the cdf of
option A i s to the right ofthat of oplion B in an ascending plot. This is shown graphically in Figure 5.36.
Option A has a smaller probability than option B of being less than or equal to each x value, so it is
the better option (unless F*(x) = F R ( x ) everywhere). First-order stochastic dominance is intuitive and
makes virtually no assumptions about the decision-maker's utility function, only that it is continuous
and monotonically increasing with increasing x .
Second-order stochastic dominance

min

for all z, then option A dominates option B. Figure 5.37 illustrates how this looks graphically. Figure 5.38
illustrates a situation when second-order stochastic dominance does not hold.
Second-order stochastic dominance makes the additional assumption that the decision-maker has a
risk averse utility function over the entire range of x. This assumption is not very restrictive and can
almost always be assumed to apply. In most fields of risk analysis (finance being an obvious exception)
it will not be necessary to resort to second-degree (or higher) dominance tests since [he decision-maker
should be able to find other, more important, differences between the available options.
Stochastic dominance is great in principle but tends to be rather onerous to apply in practice, particularly if one i s comparing several possible options. ModelRisk has the facility to compare as many options
as you wish. Firslt of all one simulates, say, 5000 iterations of the outcome of each possible option and
imports these into contiguous columns in a spreadsheet. These are then fed into the ModelRisk interface,
as shown in Figure 5.39.
Selecting an output location allows you to insert the stochastic dominance matrix as an array function
(VoseDominance), which wilI show all the dominance combinations and update if the simulation output
arrays are altered.

Chapter 5 Understanding and using the results of a nsk analysis

10 1

+.---,

-----

Option A
Option B

I

Benefit

Figure 5.36

First-order stochastic dominance: FA < FB, SO option A dominates option B.

First-order stochastic dominance

Consider options A and B having the distribution functions FA(x) and FB(x), where it is desirable to
maximise the value of x.
If FA(x) 5 FB(x) for all x, then option A dominates option B. That amounts to saying that the cdf of
option A is to the right of that of option B in an ascending plot. This is shown graphically in Figure 5.36.
Option A has a smaller probability than option B of being less than or equal to each x value, so it is
the better option (unless FA(x) = FB(x) everywhere). First-order stochastic dominance is intuitive and
makes virtually no assumptions about the decision-maker's utility function, only that it is continuous
and monotonically increasing with increasing x.
Second-order stochastic dominance

min

for all z , then option A dominates option B. Figure 5.37 illustrates how this looks graphically. Figure 5.38
illustrates a situation when second-order stochastic dominance does not hold.
Second-order stochastic dominance makes the additional assumption that the decision-maker has a
risk averse utility function over the entire range of x. This assumption is not very restrictive and can
almost always be assumed to apply. In most fields of risk analysis (finance being an obvious exception)
it will not be necessary to resort to second-degree (or higher) dominance tests since the decision-maker
should be able to find other, more important, differences between the available options.
Stochastic dominance is great in principle but tends to be rather onerous to apply in practice, particularly if one is comparing several possible options. ModelRisk has the facility to compare as many options
as you wish. First of all one simulates, say, 5000 iterations of the outcome of each possible option and
imports these into contiguous columns in a spreadsheet. These are then fed into the ModelRisk interface,
as shown in Figure 5.39.
Selecting an output location allows you to insert the stochastic dominance matrix as an array function
(VoseDominance), which will show all the dominance combinations and update if the simulation output
arrays are altered.

Chapter 5 Understanding and using the results of a nsk analys~s 10 1

-----

Option A
Option B

Benefit

Figure 5.36

First-order stochastic dominance: FA < FB, SO option A dominates option B.

First-order stochastic dominance

Consider options A and B having the distribution functions FA(x) and FB(x), where it is desirable to
maximise the value of x.
If FA(x) 5 FB(x) for all x, then option A dominates option B. That amounts to saying that the cdf of
option A is to the right of that of option B in an ascending plot. This is shown graphically in Figure 5.36.
Option A has a smaller probability than option B of being less than or equal to each x value, so it is
the better option (unless FA(x) = FB(x) everywhere). First-order stochastic dominance is intuitive and
makes virtually no assumptions about the decision-maker's utility function, only that it is continuous
and monotonically increasing with increasing x.
Second-order stochastic dominance

D(z)=

i

(FB(x)-FA(x))~L~

min

for all z, then option A dominates option B. Figure 5.37 illustrates how this looks graphically. Figure 5.38
illustrates a situation when second-order stochastic dominance does not hold.
Second-order stochastic dominance makes the additional assumption that the decision-maker has a
risk averse utility function over the entire range of x. This assumption is not very restrictive and can
almost always be assumed to apply. In most fields of risk analysis (finance being an obvious exception)
it will not be necessary to resort to second-degree (or higher) dominance tests since the decision-maker
should be able to find other, more important, differences between the available options.
Stochastic dominance is great in principle but tends to be rather onerous to apply in practice, particularly if one is comparing several possible options. ModelRisk has the facility to compare as many options
as you wish. First of all one simulates, say, 5000 iterations of the outcome of each possible option and
imports these into contiguous columns in a spreadsheet. These are then fed into the ModelRisk interface,
as shown in Figure 5.39.
Selecting an output location allows you to insert the stochastic dominance matrix as an array function
(VoseDominance), which will show all the dominance combinations and update if the simulation output
arrays are altered.

102

Risk Analysis

0

50

100

150

200

250

300

Profit

Figure 5.37 Second-order stochastic dominance: option A dominates option B because D(z) is always

Option A

1 1

0

50

100

150

200

250

300

Profit

Figure 5.38 Second-order stochastic dominance: option A does not dominate option B because D(z) is not

always >O.

5.4.6 Value-of-information methods
Value-of-information (VOI) methods determine the worth of acquiring extra information to help the
decision-maker. From a decision analysis perspective, acquiring extra information is only useful if it
has a significant probability of changing the decision-maker's currently preferred strategy. The penalty
of acquiring more information is usually valued as the cost of that extra information, and sometimes
also the delay incurred in waiting for the information.

Chapter 5 Understand~ngand using the results of a risk analysis

I03

C I" row
(i In cdurnnr

Figure 5.39

;

ModelRisk interface to determine stochastic dominance.

VOI techniques are based on analysing the revised estimates of model inputs that come with extra
data, together with the costs of acquiring the extra data and a decision rule that can be converted into
a mathematical formula to analyse whether the decision would alter. The ideas are well developed
(Clemen and Reilly (2001) and Morgan and Henrion (1990), for example, explain VOI concepts in
some detail), but the probability algebra can be somewhat complex, and simulation is more flexible and
a lot easier for most VOI calculations.
The usual starting point of a VOI analysis is to consider the value of perfect information (VOPI), i.e.
answering the question "What would be the benefit, in terms we are focusing on (usually money, but it
could be lives saved, etc.), of being able to know some parameter(s) perfectly?'. If perfect knowledge
would not change a decision, the extra information is worthless, and, if it does change a decision, then
the value of the extra knowledge is the difference in expected net benefit between the new selected
option and that previously favoured. VOPI is a useful limiting tool, because it tells us the maximum
value that any data may have in better evaluating the input parameter of concern. If the information
costs more than that maximum value, we know not to pursue it any further.
After a VOPI check, one then looks at the value of imperfect information (VOII). Usually, the
collection of more data will decrease, not eliminate, uncertainty about an input parameter, so VOII
focuses on whether the decrease in uncertainty is worth the cost of collecting extra information. In fact,
if new data are inconsistent with previous data or beliefs that were used to estimate the parameter, new
data may even increase the uncertainty.
If the data being used are n random observations (e.g. survey or experimental results), the uncertainty
about the value of a parameter has a width (roughly) proportional to lISQRT(n). So, if you already
have n observations and would like to halve the uncertainty, you will need a total of 4n observations
(an increase of 3n). If you want to decrease uncertainty by a factor of 10, you will need a total of lOOn
observations (an increase of 99n). In other words a decrease in uncertainty about a parameter value

104

Risk Analysis

becomes exponentially more expensive the closer the uncertainty gets to zero. Thus, if a VOPI analysis
shows that it is economically justified to collect more information before making a decision, there will
certainly be a point in the data collection where the cost of collecting data will outweigh their benefit.
VOPI analysis method

Consider the range of possible values for the parameter(s) for which you could collect more information.
Determine whether there are possible values for these parameters that, if known, would make the
decision-maker select a different option from the one currently deemed to be best.
Calculated the extra value (e.g. expected profit) that the more informed decision would give. This
is the VOPI.
VOll analysis method

Start with a prior belief about a parameter (or parameters), based on data or opinion.
Model what observations might be made with new data using the prior belief.
Determine the decision rule that would be affected by these new data.
Calculate any improvement in the decision capability given the new data; the measure of improvement requires some valuation and comparison of possible outcomes, which is usually taken to be
expected monetary or utility value, although this is rather restrictive.
Determine whether any improvement in the decision capability exceeds the cost of the extra information.
VOI example

Your company wants to develop a new cosmetic but there is some concern that people will have a minor
adverse skin reaction to the product. The cost of development of the product to market is $1.8 million.
The revenue NPV (including the cost of development) if the product is of the required quality is
$3.7 million.
Cosmetic regulations state that you will have to withdraw the product if 2 % or more of consumers
have an adverse reaction to your product. You have already performed some preliminary trials on 200
random people selected from the target demographic, at a costlperson of $500. Three of those people
had an adverse reaction to the product.
Management decide the product will only be developed if they can be 85 % confident that the product
will affect less than the required 2 % of the population. Decision question: Should we test more people
or just abandon the product development now? If we should test more people, then how many more?
Having observed three affected people out of 200, our prior belief about p can be modelled as
Beta(3 1,200 - 3 1) = Beta(4, 198), which gives a 57.24 % confidence that 2 % or less of the target
demographic will be affected (calculated as VoseBetaProb(2 %, 4, 198, 1) or BETADIST(2 %, 4, 198)).
Thus, the current level of information means that management would not pursue development of
the product, with no resultant cost or revenue, i.e. a net revenue of $0. However, the beta distribution
shows that it is quite possible that p is less than 2 %, and we could be losing a good opportunity
by quitting now. If this were known for sure, the company would get a profit of $3.7 million, so the
VOPI = $3.7 million * 57.24 % $0 million * 42.76 % = $2.12 million, and each test only costs $500;
it is certainly possible that more information could be worth the expense.

+

+

+

Chapter 5 Understanding and using the results of a risk analysis

I05

VOll analysis

The model in Figure 5.40 performs the VOII steps described above. The parameter of concern is the
fraction of people (prevalence), p, in the target demographic (women 18-65) who would have an
adverse reaction, with a prior uncertainty described by Beta(4, 198), cell C12.
The people in the study are randomly sampled from this demographic, so if we test m extra people
(cell C22) we can assume the number of people who would be adversely affected, s, would follow a
Binomial(m, p) distribution (cell C24).
The revised estimate for p would then become Beta(4 s, 198 (m - s)). The confidence we then
have that p is < 2 % is given by VoseBetaProb(2 %, 4 s, 198 (m - s), l), cell C27. If this confidence
exceeds 85 %, management would take the decision to develop the product (cells C31:C32).
The model simulates different possible values of p from the prior. It models various possible numbers
of extra tests, m, and simulates the extra data generated (s out of m), then evaluates the expected return of
the resultant decision. Of course, although one may have reached the required confidence for p, the true
value for p doesn't change and a bad decision may still be taken. The value of information is calculated
for each iteration, and the mean function is used to calculate the expected value of information.
Note that for this example the question being posed is how many more people to test in one go. A
more optimal strategy would be to test a smaller number, review the results and perform a VOII analysis.
This iterative process will either achieve the required confidence at a smaller test cost or lead one to
abandon further testing because one is fairly sure that the required performance will not be achieved.
It might at first seem that we are getting something for nothing here. After all, we don't actually
know anything more until we perform the extra tests. However, the decision that would be made would
depend on the results of those extra tests, and those results depend on what the true value of p actually

+

i

Perfect knowledge
Decisionwith perfect information:

+

+

+

=IF(Cl 9=1 ,"Deveiop","Don't
develop")
=C22'E8/1000000

106

Risk Analys~s

is. Thus, the analysis is based on our prior for p (i.e, what we know to date about p) and the decision
rule. When the model generates a scenario, it selects a value from the prior for p. It is saying: "Let's
imagine that this is the true value for p". If that value is t 2 %, we should develop the product of
course, but we'll never know the value of p (until we have launched the product and have enough
customer history to know its value). However, extra tests will get us closer to knowing its true value,
1
0.9 0.8 -

z
t9

0.7 -

-E
C

0
.-

0.6

-

b
+

0.5

-

6

0.4 -

.-C

(I)

2

-

0.3

9

-

0.2 0.1 0

1

0

500

1000

1500

2000

2500

3000

3500

2500

3000

3500

Tested people

Figure 5.41

VOI example model results.

1.8
1.6 1.4 -

2

t9

--:
C

.-0
m

1.2 1-

.C 0.8 0

-3

9

0.6 0.4 0.2 0

I

0

500

1000

1500

2000

Tested people

Figure 5.42 VOI example model results where tests have no cost.

Chapter 5 Understanding and using the results of a risk analysis

107

and so we end up taking less of a gamble. When the model picks a small value for p, it will probably
generate a small number of affected people in our new tests, and our interpretation of this small number
as meaning p is small will often be correct. The danger is that a high p value could by chance result in
an unrepresentatively small fraction of m being affected, which will be misinterpreted as a small p and
lead management to make the wrong decision. However, as m gets bigger, so that risk diminishes. The
balance that needs to be made is that the tests cost money. The model simulates 20 scenarios where m
is varied between 100 and 3000, with the results shown in Figure 5.41.
It tells us that the optimal strategy, i.e. the strategy with the greatest expected VOII, is to perform
about another 700 tests. The sawtooth effect in these plots occurs because of the discrete nature of
the extra number affected that one would observe in the new data. Note that, if the tests had no cost,
the graph above would look very different (Figure 5.42). Now it is continually worth collecting more
information (providing it is actually feasible to do) because there is no penalty to be paid in running
more tests (except perhaps time, which is not included as part of this problem). In this case the value
of information asymptotically approaches the VOPI(= $2.121nillion) as the number of people tested
approaches infinity.

Part 2
Introduction
Part 2 constitutes the bulk of this book and covers a wide range of risk analysis modelling techniques that are
in general use. I have again almost exclusively used Microsoft Excel as the modelling environment because
it is ubiquitous and makes it easy to show the principles of a model with printouts of the spreadsheet.
I have also used Vose Consulting's ModelRisk add-in to Excel (see Appendix 11), but I have done my
best to avoid malung this book a glorified advertisement for a software tool. The reality is that you will
need some specialist software to do risk analysis. Using ModelRisk gives me the opportunity to explain
the thinking behind risk analysis modelling without the message getting lost in very long calculations or
wrestling with the mechanical limitations of modelling in spreadsheets. Some of the simpler functions
in ModelRisk are available in other risk analysis software tools, and Excel has some statistical functions
(although they are of dubious quality). When I have used more complex functions in ModelRisk (like
copulas or time series, for example), I have tried to give you enough information for you to do it
yourself. Of course, we'd love you to buy ModelRisk - there is a lot more in the software than I have
used in this book (Appendix I1 gives some highlights and explains how ModelRisk interacts with other
risk analysis spreadsheet add-ins), it has a lot of very nice user-interfaces and its routines can be called
from C++ and VBA. We offer an extended demo period for ModelRisk on the inside back-cover of this
book, together with files for the models created for this book that you can play around with.
Notation used in the spreadsheet models

I have given printouts of spreadsheet models throughout this book. The models were produced in
Microsoft Excel version 2003 and ModelRisk version 2.0 which complies with the standard Excel
rules for cell formulae. The equations easily translate to @RISK, Crystal Ball and other Monte Carlo
simulation packages where they have similar functions. In each spreadsheet, I have given a formulae
table so that the reader can follow and reproduce the model, for example:

lI 0

Risk Analysis

Here you'll see an entry for cells D2:D8 as = VoseLognormal(B2, C2). Where I have given one formula
for a range of cells, it refers to the first cell of the range, and the formulae for other cells in the range
are those that would appear by copying that formula over, for example by using the Excel Autofill
facility. The formulae in the other cells in the range will vary according to their different position. So,
for example, copying the formula above into the other cells would give:

etc.
If the formula had included a fixed reference using the "$" symbol in Excel notation, e.g. =
VoseLognormal(B$2, C2), it would have copied down as

etc.
The VoseLognormal function generates random samples from a lognormal distribution, a very common
distribution that features pretty much in all Monte Carlo simulation add-ins to Excel. So, for example,
VoseLognormal(2,3) could be replaced as follows:

@RISK = RiskLognorm(2,3)
Crystal Ball = CB.Lognormal(2, 3)
There are maybe a dozen other, less common, Monte Carlo add-ins with varying levels of sophistication, and they all follow the same principle, but be careful to ensure that they parameterise a distribution
in the same way.
Excel allows you to input a function as an array, meaning that one function covers several cells.
Array formulae in Excel are inputted by highlighting a range of cells, typing the formula and then
CTRL-SHIFT-Enter together. The function then appears within curly brackets in the formula bar. Array
functions are used rather extensively with ModelRisk. For example:
A
1
2
3
4
5
6
7
8
9

B
Value
1
2

3
4
5
6
7

C
Shuffled
3
5
2
6
4
1

7

Dl

E

C2:C8

F

Formulae table
{=VoseShuffle(B2:B8)}

I

IG

Part 2 Introduction

III

The VoseShuffle function simply randomises the order of the values listed in its parameter array.
You'll see how I display the formula within curly brackets because the VoseShuffle covers that whole
range with one function, which is how it appears when you see it in Excel's formula bar.
Note also that all functions with a name all in upper-case letters are always native Excel functions,
which is how they appear in the spreadsheet. Functions of the form VoseXxxx belong to ModelRisk.
Types of function in ModelRisk

ModelRisk has several types of function that apply to a probability distribution. I'll use the normal
distribution as an example.
VoseNormal(2, 3) generates random values from a normal distribution with mean = 2 and standard
deviation = 3. An optional third parameter (we call it the "U-parameter") is the quantile of the distribution; for example, VoseNormal(2, 3, 0.9) returns the 90th percentile of the distribution. The U-parameter
must obviously lie on [0, 11. The main use of the U-parameter is to control how random samples are
generated from the distribution. For example:

will generate random values from the normal distribution using the random number generators of
@RISK, Crystal Ball or Excel respectively to control the sampling.
The second type of function calculates probabilities for each distribution featured in ModelRisk. For
example, VoseNormalProb(0.7, 2, 3, FALSE) returns the probability density function of the normal distribution evaluated at x = 0.7, as would VoseNormalProb(0.7, 2, 3, 0) or VoseNonnalProb(0.7, 2, 3),
since the last parameter is assumed FALSE if omitted. VoseNormalProb(0.7, 2, 3, TRUE) or VoseNormalProb(0.7, 2, 3, 1) returns the cumulative distribution function of the normal distribution evaluated
at x = 0.7. To this degree, these functions are analogous to Excel's NORMDIST function, e.g.

However, the probability calculation functions can take an array of x values and then return the joint
probability. For example, VoseNormalProb({O.1,0.2,0.3},2, 3,O) = VoseNormalProb(O.1, 2, 3, 0) *
VoseNormalProb(0.2, 2, 3, 0) * VoseNormalProb(0.3, 2, 3, 0). There are two advantages to this feature:
we don't need a vast array of functions to calculate the joint probability for a large dataset, and the
functions are far faster and more accurate than multiplying a long array because, depending on the
distribution, there will be a lot of calculations that can be simplified. Joint probabilities can quickly tend
to very small values, beyond the range that Excel can handle, so ModelRisk offers log base 10 versions
of these functions too, for example:

1 12

Risk Analysis

These functions allow us to develop very efficient log likelihood models, for example, which we can
then optimise to fit to data (see Chapter 10).
Finally, ModelRisk offers what we call object functions, for example VoseNormalObject(2, 3). If you
type =VoseNormalObject(2, 3) into a cell, it returns the string "VoseNormalObject(2, 3)". In many types
of risk analysis calculation we want to do more with a distribution than simply take a random sample
or calculate a probability. For example, we might want to determine its moments (mean, variance, etc.).
The following model does this for a Gamma(3, 7) distribution in two different ways:

The VoseMoments array function returns the first four moments of a distribution and takes as its input
parameter the distribution type and parameter values. There are many other situations in which we want
to manipulate distributions as objects, for example:

This function uses a hybrid Monte Carlo approach to add n Lognormal(l0, 5) distributions together,
where n is itself a Poisson(50) random variable. Note that the lognormal distribution is defined as an
object here because we are using the distribution many times, taking on average 50 independent samples
from the distribution for each execution of the function. However, the Poisson distribution is not an
object because for one execution of the function it simply draws a single random sample. Objects can
be imbedded into other objects too. For example:

is the object for a distribution constructed by splicing a gamma distribution (left) and a shifted Pareto2
distribution (right) together at x = 3. Allowing objects to exist alone in cells (e.g. cell F3 in the above
figure) allows us to create very transparent and efficient models.
Mathematical notation

There are some mathematical notations listed below that the reader will come across in a few parts of
the text. I have tried to keep the algebra to a minimum and the reader should not worry unduly about
this list. There is nothing in this book that really extends beyond the level of mathematics that one
learns in a quantitative undergraduate course.

Part 2 Introduction

x
0

lb

f (x)dx

i=l

is the label generally given to the value of a variable
is the label generally given to an uncertain parameter
means the integral between a and b of the function f (x)
means the sum of all xi values, where i is between 1 and n, i.e. xl

Xi

1 13

+ x2 + . . . + xn

n

n

xi

d

f,

(XI

a

-f (x, y)

ax
x

means the product of all xi values, where i = 1 to n, i.e. ~ 1 . ~ 2. .. .x,
.
means the differential of f (x) with respect to x
means the partial derivative of a function of x and y, f (x, y), with respect to x
means "is approximately equal to7'

F,1

mean "is less than or equal to" and "is greater than or equal to"

<<, >>

mean "is much less than" and "is much greater than"

x!

means "x-factorial", = 1 * 2 * 3 * . . . * x or

exp[x] or ex

means "exponential x" = 2.7 182818 . . .X

ln[xl
x

means the natural logarithm of x, so ln[exp[x]] = x
means the average of all x values

1~ 1

means "modulus x", the absolute value of x

r(x)

is the gamma function evaluated at x : r ( x ) =

B(x, Y)

is the beta function evaluated at (x, y ) :

-

ni
X

i=l

i

/

eFUu

du

tx-'(I - t ) ~ - 'dt =

0

r(x)r(y)
r(x Y )

+

Other special functions are explained in the text where they appear. For those readers with some
background in probability modelling, you might not be used to the notation I use for stating that a
variable follows some distribution. For example, I write:

whereas the reader might be used to

I use the "=" notation because it is easier to write formulae that combine variables and it reflects how
one uses Excel. For example, where I might write

1 14

R~skAnalys~s

using the other notation, we would need to write
Y -- Normal(100, 10)

Z -- Gamma(2,3)
X=Y+Z
which gets to be rather tedious.
This chapter is set out in sections, each of which solves a number of problems in a particular area.
I hope that the problem-solving approach will complement the theory discussed earlier in the book.
References are made to where the theory used in the problems is more fully discussed. The solution to
each problem finishes with the symbol +.

Chapter 6

Probability mathematics
and simulation
This chapter explores some very basic theories of probability and statistics that are essential for risk
analysis modelling and that we need to understand before moving on. In my experience, ignorance of
these fundamentals is a prime cause of the logical failure of a model. Risk analysis software is often
sold on the merits of removing the need for any in-depth statistical theory. Although this is quite true
with respect to using the software, it is often not the case when it comes to producing a logical model.
In this chapter we begin by looking at the concepts that are used in the mathematics of probability
distributions. Then we define some basic statistics in common use. We look at a few probability concepts
that are essential to understand if one is to be assured of producing logical models. This chapter is
designed to offer a reference of statistical and probability concepts: the application of these principles
is left to the appropriate chapters later in the book.
For most people (myself included), probability theory and statistics were not their favourite subjects at
college. I would, however, encourage those readers who find themselves equipped with limited endurance
for statistical theory to get at least as far as the end of Section 6.4.4 before moving on.

6.1 Probability Distribution Equations
6.1.1 Cumulative distribution function (cdf)

t

i

The (cumulative) distribution function, or probability distribution function, F(x), is the mathematical
equation that describes the probability that a variable X is less than or equal to x, i.e.
F (x) = P ( X 5 x)

for all x

where P(X 5 x) means the probability of the event X 5 x.
A cumulative distribution function has the following properties:
d
1. F(x) is always non-decreasing, i.e. -F(x) 2 0.

dx

2.

F(x) = 0 at x = -oo;
F(x) = 1 at x = oo.

6.1.2 Probability mass function (pmf)
If a random variable X is discrete, i.e. it may take any of a specific set of n values xi, i = 1, . . . , n, then

p(x) is called the probability mass function.

1 16

Risk Analysis

Figure 6.1

Distribution of the possible number of heads in three tosses of a coin.

Note that

and

For example, if a coin is tossed 3 times, the number of observed heads is discrete. The possible values
of xi are shown in Figure 6.1 against their probability mass function f (x) and probability distribution
function F(x). In this book, I will often show a discrete variable's probability mass function by joining
together the probability masses with straight lines and marking each allowed value with a point. Vertical
histograms are usually more appropriate representations of discrete variables, but, by using the pointsand-lines type of graph, one can show several discrete distributions together in the same plot.

6.1.3 Probability density function (pdf)
If a random variable X is continuous, i.e. it may take any value within a defined range (or sometimes
ranges), the probability of X having any precise value within that range is vanishingly small because
we are allocating a probability of 1 between an infinite number of values. In other words, there is no
probability mass associated with any specific allowable value of X. Instead, we define a probability
density function f (x) as

i.e. f (x) is the rate of change (the gradient) of the cumulative distribution function. Since F(x) is
always non-decreasing, f (x) is always non-negative.

Chapter 6 Probab~l~ty
mathernattcs and s~rnulatton 1 17

So, for a continuous distribution we cannot define the probability of observing any exact value.
However, we can determine the probability of x lying between any two exact values (a, b):
i

P ( a ~ x 5 b) = F ( b ) - F(a) where b > a

i

(6.3)

Example 1.6

Consider a continuous variable that takes a Rayleigh(1) distribution. Its cumulative distribution function
is given by

and its probability density function is given by

The probability that the variable will be between 1 and 2 is given by

F ( x ) and f ( x ) for this example are shown in Figure 6.2. In this book, we will show a continuous
variable's probability density function with a smooth curve, as illustrated. A square sometimes plotted in

0

0.5

1

2
2.5
Variable value x

1.5

3

3.5

4

4.5

Figure 6.2 Probability density and cumulative probability plots for a Rayleigh(1) distribution.

1 18

Risk Analysis

the middle of this curve represents the position of the mean of the distribution. Providing the distribution
is unimodal, if this point is higher than the 50 percentile, the distribution will be right skewed, and if
lower than the 50 percentile it will be left skewed. +

6.2 The Definition of "Probability"
Probability is a numerical measurement of the likelihood of an outcome of some random process.
Randomness is the effect of chance and is a fundamental property of the system, even if we cannot directly measure it. It is not reducible through either study or further measurement, but may be
reduced by changing the physical system. Randomness has been described as "aleatory uncertainty"
and "stochastic variability". The concept of probability can be developed neatly from two different
approaches:
Frequentist definition

The frequentist approach asks us to imagine repeating the physical process an extremely large number of
times (trials) and then to look at the fraction of times that the outcome of interest occurs. That fraction
is asymptotically (meaning as we approach an infinite number of trials) equal to the probability of that
particular outcome for that physical process. So, for example, the frequentist would imagine that we
toss a coin a very large number of times. The fraction of the tosses that come up heads is approximately
the true probability of a single toss producing a head, and the more tosses we do the closer the fraction
becomes to the true probability. So, for a fair coin, we should see the number of heads stabilise at
around 50 % of the trials as the number of trials gets truly huge. The philosophical problem with this
approach is that one usually does not have the opportunity to repeat the scenario a very large number
of times. How do we match this approach with, for example, the probability of it raining tomorrow, or
you having a car crash?
Axiomatic definition

The physicist or engineer, on the other hand, could look at the coin, measure it, spin it, bounce lasers
off its surface, etc., until one could declare that, owing to symmetry, the coin must logically have a
50 % probability of falling on either surface (for a fair coin, or some other value for an unbalanced
coin, as the measurements dictated). Determining probabilities on the basis of deductive reasoning has
a far broader application than the frequency approach because it does not require us to imagine being
able to repeat the same physical process infinitely.
A third, subjective, definition

In this context, "probability" would be our measure of how much we believe something to be true. 1'11
use the term "confidence" instead of probability to make the separation between belief and real-world
probability clear. A distribution of confidence looks exactly the same as a distribution of probability
and must follow the same rules of complementation, addition, etc., which easily lead to mixing up of
the two ideas. Uncertainty is the assessor's lack of knowledge (level of ignorance) about the parameters that characterise the physical system that is being modelled. It is sometimes reducible through
further measurement or study. Uncertainty has also been called "fundamental uncertainty", "epistemic
uncertainty" and "degree of belief'.

Chapter 6 Probability mathematics and simulation

1 19

6.3 Probability Rules
There are four important probability theorems for risk analysis, the meaning and use of which are
discussed in this section:
strong law of large numbers (also called Tchebysheff's inequality1);
binomial theorem;
Bayes' theorem;
central limit theorem (CLT).

I will also describe a number of mathematical techniques useful in risk analysis and referenced
elsewhere:
Taylor series;
Tchebysheff's rule (theorem);
Markov inequality;
least-squares linear regression;
rank order correlation coefficient.
We'll begin with some basics on conditional probability, using Venn diagrams to help visualise the
thinking.

6.3.1 Venn diagrams
Venn diagrams are introduced here to help visualise some basic rules of probability. In a Venn diagram
the squared area, denoted by E , contains all possible events, and we assign it an area equal to 1. The
circles represent specific events. Probabilities are represented by the ratios of areas. For example, the
probability of event A in Figure 6.3 is the ratio of area A to the total area E:

Figure 6.3

Venn diagram for a single event A.

Mutually excluave events

Figure 6.4 gives an example of a Venn diagram where two events (A and B) are identified. The events
are mutually exclusive, meaning that they cannot occur together, and therefore the circles do not overlap.

'

rp
F

After the Russian mathematician Pafnutl Tchebysheff (1821- 1894). Other transliterations of his name are Tchebycheff, Chebyshev
and Tcheblchef.

120

Risk Analysis

Figure 6.4 Venn diagram for two mutually exclusive events.

The areas of the circles are denoted by A and B , and the probability of the occurrence of events A and
B are denoted by P ( A ) and P ( B ) :

P(A) = A/&
P(B)= B/E
You can think of a Venn diagram as an archery target. Imagine that you are firing an arrow at the
target and that you have an equal chance of landing anywhere within the target area, but will definitely
hit it somewhere. The circles on the target represent each possible event, so if your arrow lands in circle
A, it represents event A happening. In Figure 6.4 you cannot fire an arrow that will land in both A and
B at the same time, so events A and B cannot occur at the same time:

P(A n B) =o
The probability of either event occurring is then just the sum of the probabilities of each event,
because we just need to add the A and B areas together:

P(A U B) = P(A)+ P(B)
Events that are not mutually exclusive

In Figure 6.5, A and B are not mutually exclusive: they can occur together, represented by the overlap
in the Venn diagram. The figure shows the four different areas that are now produced. It can be seen
from these areas that

P(A u B) = P(A)

+P(B) - P(A n B)

Figure 6.5 Venn diagram for two events that are not mutually exclusive.

122

Risk Analysis

Figure 6.6

More complex Venn diagram example.

6.3.3 Central limit theorem
The central limit theorem (CLT) is one of the most important theorem for risk analysis modelling. It
says that the mean 2 of a set of n variables (where n is large), drawn independently from the same
distribution f (x), will be normally distributed:

where p and a are the mean and standard deviation of the f (x) distribution from which the n samples
are drawn.
Example 6.2

If we had 40 variables, each following a Unifonn(1, 3) distribution (with mean = 2 and standard
deviation = l/&), the average of these variables would (approximately) have the following distribution:

(Jim)

(A)

2 = Normal 2, ----- = Normal 2, -

i.e. n is approximately normally distributed with mean = 2 and standard deviation = l / m .

+

Exercise 6.1: Create a variety of Monte Carlo models, averaging n distributions of the same type
with the same parameter values, and see what the resultant distribution looks like. Try different
values for n, e.g. n = 2, 5, 20, 50 and 100, and different distribution types, e.g. triangular, normal,
uniform and exponential. For what values of n are these average distributions close to normal? For
the triangular distribution, does this value of n vary depending on where the most likely parameter's
value lies relative to the minimum and maximum parameter values?
It follows, by multiplying both sides of Equation (6.4) by n , that the sum, C,of n variables drawn
independently from the same distribution is given by

Chapter 6 Probability mathematics and simulation

123

Example 6.3

The sum I; of 40 Uniform(1, 3) independent variables will have (approximately) the following distribution:

Remarkably, this theorem also applies to the sum (or average) of a large number of independent
variables that have different probability distribution types, in that their sum will be approximately
normally distributed providing no variable dominates the uncertainty of the sum.
The theorem can also be applied where a large number of positive variables are being multiplied
together. Consider a set of Xi, i = 1, . . . , n, variables that are being independently sampled from the
same distribution. Then their product, l7, is given by

6

1

Taking the natural log of both sides:
n

Since each variable Xi has the same distribution, the variables (In Xi) must also have the same
distribution, and thus, from the central limit theorem, In l3 is normally distributed. Now, a variable is
lognormally distributed if its natural log is normally distributed, i.e. l7 is lognormally distributed.
In fact, this application of the central limit theorem still approximately holds for the product of a large
number of independent positive variables that have different distribution functions. There are a lot of
situations where this seems to apply. For example, the volume of recoverable oil reserves within a field
is approximately lognormally distributed since they are the product of a number of independent(ish)
variables, i.e. reserve area, average thickness, porosity, gasloil ratio, (1-water saturation), etc.
Most risk analysis models are a combination of adding (subtracting) and multiplying variables together.
It should come as no surprise, therefore, that, from the above discussions, most risk analysis results seem
to be somewhere between normally and lognormally distributed. A lognormal distribution also looks
like a normal distribution when its mean is much larger than its standard deviation, so a risk analysis
model result even more frequently looks approximately normal. This particularly applies to project and
financial risk analyses where one is looking at cost or time to completion or the value of a series of
cashflows.
It is important to note from the results of this theorem that the distribution of the average of a set of
variables depends on the number of variables that are being averaged, as well as the uncertainty of each
variable. It may be tempting, at times, to seek an expert's estimate of the distribution of the average of a
number of variables; for example, the average time it will take to lay a kilometre of road, or the average
weight of the fleece of a particular breed of sheep. The reader can now see that it will be a difficult

task for experts to provide a distribution of an average measure: they will have to know the number of
variables for which the estimate is the average and then apply the central limit theorem - which is no
easy task to do in one's head. It is much better to estimate the distribution of the individual items and
do the central limit theorem calculations oneself.
Many parametric distributions can be thought of as the sum of a number of other identical distributions.
In general, if the mean is much larger than the standard deviation for these summary distributions, they
can be approximated by a normal distribution. The central limit theorem is then useful for determining
the parameters of the normal distribution approximation. Section 111.9 discusses many of the useful
approximations of one distribution for another.

6.3.4 Binomial theorem
The binomial theorem says that for some values a and b and a positive integer n

The binomial coefficient,

, also sometimes written as n C x , is read as "n choose X" and is calcu-

lated as

where the exclamation mark denotes factorial, so 4! = 1 * 2 * 3 * 4, for example. The binomial coefficient calculates the number of different ways one can order n articles where x of those articles are of
one type and therefore indistinguishable from one another and the remaining ( n - x ) are of another type,
again each being indistinguishable from another. The Excel function COMBIN calculates the binomial
coefficient.
The arguments underpinning this equation go as follows. There are n ! ways of ordering n articles, as there are n choices for the first article, then (n - 1 ) choices for the second, ( n - 2) choices
for the third, etc., until we are left with just the one choice for the last article. Thus, there are
n * (n - 1 ) * (n - 2 ) * . . . * 1 = n ! different ways of ordering these articles. Now, suppose that x of
these articles were identical: we would not be able to differentiate between two orderings where we
simply swapped the positions of two of these articles. Repeating the logic above, there are x ! different
orderings that would all appear the same to us, so we would only recognise l l x ! of the possible orderings
and the number of orderings would now be n ! l x ! Now, suppose that the remaining (n - x ) articles are
also identical but differentiable from the x articles. Then we could only distinguish l l ( n - x ) ! of the
remaining possible orderings, and thus the total number of different combinations is given by
n!
x!(n-x)!

A useful way of quickly calculating the binomial coefficients for small n is given by Pascal's triangle
(Figure 6.7). The outside of the triangle is filled with Is, and each value inside the triangle is calculated

Chapter 6 Probability mathematics and simulation

6

7

1

8

1

9
10

6

1

7

1

8

9

1
10

28

120

15

20

35

35

56

84

36
45

15
21

70
126

210

56
126

252

210

1

6

21

7
28

84

1
8

1

9

36
120

I25

45

1
10

1

Figure 6.7 Pascal's triangle.

as the sum of the two values immediately above it. Row n then represents the binomial coefficient for
n, which also appears as the second value in each row. So, for example,

as highlighted in the figure. Note that the binomial coefficients are symmetric so that

This makes sense, as, if we swap x for (n - x) in Equation (6.6), we arrive back at the same formula.
If we replace a with probability p, and b with probability (1 - p), the equation becomes

The summed component

is the binomial probability mass function for x successes in n trials where each trial has a probability
p of success. In a binomial process, all successes are considered identical and interchangeable, as are
all failures.
Properties of the binomial coeficient

I26

Risk Analysis

The last identity is known as Vandermonde's theorem (A. T. Vandermonde, 1735-1796).
Calculating x! for large x

x! is very laborious to calculate for high values of x. For example, loo! = 9.3326E+157 and Excel's
FACT@) cannot calculate values higher than 170! The probability mass functions of many discrete
probability distributions contain factorials, and we therefore often want to work out factorials for values
larger than 170. Algorithms for generating distributions get around any calculation restriction by using
approximations, for example the following equation, known as the stirling2formula, can be used instead
to get a very close approximation:

-

where
is read "asymptotically equal" and means that the right-hand side approaches the left-hand
side as n approaches infinity.
However, if you are attempting to calculate a probability exactly, you can still use the Excel function
GAMMALNO:

This may allow you to manipulate multiplications of factorials, etc., by adding them in log space. But,
be warned, this formula will not return exactly the same answer as FACT(), for example

and, while it is possible to get values for GAMMALN(x) where x > 171, Excel will return an error if
you attempt to calculate the corresponding EXP(GAMMALN(x)).

6.3.5 Bayes' theorem
Bayes' theorem3 is a logical extension of the conditional probability arguments we looked at in the
Venn diagram section. We saw that
P(A1B) =

P(A

n B)

P(B)

and P(B1A) =

P(B

n A)

P (A)

James Stirling (1692-1770) - Scots mathematician.
- English philosopher. A short biography and a reprint of his original paper describing Bayes'
theorem appear in Press (1989).

' Rev. Thomas Bayes (1702-1761)

Chapter 6 Probability mathematics and sirnulat~on

127

and hence

which is Bayes' theorem, and, in general,

The following example illustrates the use of this equation. Many more are given in the section on
Bayesian inference.
Example 6.4

Three machines A, B and C produce 20 %, 45 % and 35 % respectively of a factory's wheel nuts output;
2 %, 1 % and 3 % respectively of these machines outputs are defective:
(a) What is the probability that any wheel nut randomly selected from the factory's stock will be
defective? Let X be the event where the wheel nut is defective, and A, B and C be the events
where the selected wheel nut comes from machines A, B and C respectively:

(b) What is the probability that a randomly selected wheel nut will have come from machine A if it
is defective?
From Bayes' theorem

In other words, in Bayes' Theorem we divide the probability of the required path (the probability that
it came from machine A and was defective) by the probability of all possible paths (the probability that
it came from any machine and was defective). +
Example 6.5

We wish to know the probability that an animal will be infected (I),given that it passes (Pa) a specific
veterinary check, i.e. P (IIPa).

128

R~skAnalvsis

Animal
infected?

<

Figure 6.8 Event tree for Example 6.5.

The problem can be visualised by an event tree diagram (Figure 6.8). First of all, the animal will be
infected (I) or not infected (N). Secondly, the animal will either pass (Pa) or fail (F) the test.
From Bayes' theorem

In veterinary terminology
and thus P ( N ) = (1 - p)
P ( I ) = prevalence p
P ( F I I ) = the sensitivity of the test Se and thus P(Pa1I) = (1 - Se)
P(PalN) = the specificity of the test Sp
Putting these elements into Bayes' theorem,

6.3.6 Taylor series
The Taylor series is a formula that determines a polynomial approximation in x of some mathematical
function f (x) centred at some value xo:

where f ( m ) represents the mth derivative with respect to x of the function f

Chapter 6 Probability mathematics and simulation

129

In the special case where xo = 0, the series is known as the Maclaurin series of f (x):

The Taylor and Maclaurin series expansions are also used to provide polynomial approximations to
probability distribution functions.

6.3.7 Tchebysheff s rule
If a dataset has mean T and standard deviation s, we are used to saying that 68 % of the data will lie
between (T - s ) and (T s), 95 % lie between (T - 2s) and (T 2s), etc. However, that is only true
when the data follow a normal distribution. The same applies to a probability distribution. So, when the
data, or probability distribution, are not normally distributed, how can we interpret the standard deviation?
Tchebysheff's rule applies to any probability distribution or dataset. It states:

+

+

"For any number k greater than 1, at least (1 - l / k 2 ) of the measurements will fall
within k standard deviations of the mean".
Substituting k = 1, Tchebysheff's rule says that at least 0 % of the data or probability distribution lies
within one standard deviation of the mean. Well, we already knew that! However, substitute k = 2 tells
us that at least 75 % of the data or distribution lie within two standard deviations of the mean. That is
useful information because it applies to all distributions.
This is a fairly conservative rule in that, if we know the distribution type, we can specify a much
higher percentage (e.g. 95 % for two standard deviations for a normal distribution, compared with 75 %
with Tchebysheff's rule), but it is certainly helpful in interpreting the standard deviation of a dataset or
probability distribution that is grossly non-normally distributed.
From Figure 6.9 you can see that, for any k, knowing the distribution type allows you to specify a
much higher fraction of the distribution to be contained in the range mean fk standard deviations.
The bimodal distribution tested was as shown in Figure 6.10.

6.3.8 Markov inequality
The Markov inequality gives some indication of the range of a distribution, in a similar way to Tchebysheff's rule. It states that for a non-negative random variable X with mean p

for any constant k greater than p.
So, for example, for a random variable with mean 6, the probability of being greater than 20 is less
than or equal to 6/20 = 30 %.
Of course, being very general like Tchebysheff's rule, it makes a rather conservative statement. For
most distributions, the probability is much smaller than m / k (see Table 6.1 for some examples).

13 0

R~skAnalysis

100%
90%
80%
70%

Normal distribution

60%
50%
40%
30%
20%
10%
0%

Figure 6.9

Comparison of Tchebysheff's rule with the results of a few distributions.

-1 00

-50

0

50

Variable value

Figure 6.10 A bimodal distribution.

100

150

Chapter 6 Probability mathematics and simulation

Table 6.1 Markov's rule for different
distributions.
I Distribution with u = 6 1 PIX > 20)

1

I
I

I
I

Lognormal(6, a)
Pareto(0, 6(0 - 1)/0)

I
I

Max. of 6.0 %
Max. of 3.21 %

13 1

6.3.9 Least-squares linear regression
The purpose of least-squares linear regression is to represent the relationship between one or more
independent variables X I ,x2 and a variable y that is dependent upon them in the following form:

where xji is the ith observed value of the independent variable xi, yi is the ith observed value of the
dependent variable y, E L is the error term or residual (i.e. the difference between the observed y values
and that predicted by the model), Bj is the regression slope for the variable xj and Po is the y-axis
intercept.
Simple least-squares linear regression assumes that there is only one independent variable x. If we
assume that the error terms are normally distributed, the equation reduces to

where m is the slope of the line and c is the y-axis intercept and s is the standard deviation of the
variation of y about this line.
Simple least-squares linear regression is a very standard statistical analysis technique, particularly
when one has little or no idea of the relationship between the x and y variables. It is probably particularly
common because the analysis mathematics are simple (because of the normality assumption), rather
than it being a very common rule for the relationship between variables. LSR makes four important
assumptions (Figure 6.11):
1. Individual y values are independent.
2. For each xi there are an infinite number of possible values of y, which are normally distributed.
3. The distribution of y given a value of x has equal standard deviation for all x values and is centred
about the least-squares regression line.
4. The means of the distribution of y at each x value can be connected by a straight line y = rnx c.

+

Assumptions behind least-squares regression analysis

Statisticians often make transformations of the data (e.g. Log(Y), JX) to force a linear relationship.
That greatly extends the applicability of the regression model, but one must be particularly careful that
the errors are reasonably normal, and one runs an enormous risk in using the regression equations of
malung predictions outside the range of observations.

132

Risk Analysis

Figure 6.11 An illustration of the concepts of least-squares regression.
Estimation of parameten

The simple least-squares regression model determines the straight line that minimises the sum of the
square of the ei errors. It can be shown that this occurs when

where Y,7 are the mean of the observed x and y data and n is the number of data pairs (xi, yi).
The fraction of the total variation in the dependent variable that is explained by the independent
variable is known as the coefficient of determination R', which is calculated as

R 2 = I - - SSE
TSS
where the sum of squares errors, SSE, is given by
SSE =

C (yi - ji)'

and the total sum of squares, TSS, is given by

TSS =

C (yi - 7l2

and where ji are the predicted y values at each xi:

For simple least-squares regression (i.e. only one independent variable), the square root of R' is
equivalent to the simple correlation coefficient r :
r = d X

Chapter 6 Probability mathematics and simulation

133

Correlation coefficient r may alternatively be calculated as

Coefficient r provides a quantitative measure of the linear relationship between x and y. It ranges
from - 1 to +1: a value of r = - 1 or +1 indicates a perfect linear fit, and r = 0 indicates no linear
relationship exists at all. As

(the sum of squared errors between the observed and predicted y values) tends to zero, so r 2 tends to 1
and therefore r tends to - 1 or +1, its sign depending on whether m is negative or positive respectively.
The value of r is used to determine the statistical significance of the fitted line, by first calculating
the test statistic t as

The t-statistic follows a t-distribution with (n - 2) degrees of freedom (provided the linear regression
assumptions of normally distributed variation of y about the regression line hold) which is used to
determine whether the fit should be rejected or not at the required level of confidence.
The standard error of the y estimate, S y x ,is calculated as

This is equivalent to the standard deviation of the error terms si. These errors reflect the true variability
of the dependent variable y from the least-squares regression line. The denominator (n - 2) is used,
instead of the (n - 1) we have seen before for sample standard deviation calculations, because two values
m and c have been estimated from the data to determine the equation values, and we have therefore
lost two degrees of freedom instead of the one degree of freedom usually lost in determining the mean.
The equations of the regression line equation and the S,, statistic can be used together to produce a
stochastic model of the relationship between X and Y, as follows:

Some caution is needed in using such a model. The regression model is intended to work within
the range of the independent variable X for which there have been observations. Using the model
outside this range can produce very significant errors if the relationship between x and y deviates from
this linear relationship. This is also purely a model of variability, i.e. we are assuming that the linear
relationship is correct and that the parameters are known. We should also include our uncertainty about
the parameters, and perhaps about whether the linear relationship is even appropriate.

134

Risk Analysis

Example 6.6

Consider the dataset in Table 6.2 which shows the result of a survey of 30 people. They were asked to
provide details of their monthly net income {xi} and the amount they spent on food each month {yi}.
The values of m , c, r and Syx were calculated using the Excel functions:

The line ji= mxi

+ c is plotted against the data points in Figure 6.12. +

Table 6.2 Data for Example 6.6.
Net monthly
income
X

505
517
523
608
609
805
974
1095
1110
1139
1352
1453
1461
1543
1581
1656
1748
1760
1811
1944
1998
2054
2158
2229
2319
2371
2637
2843
2889
3096

Monthly food
expenditure
Y

Least-squares
regression estimate

Y'

Error terms
6

Chapter 6 Probability mathematics and simulation

I35

Net monthly income

Figure 6.12 The line pi = mxi

-1 50

-1 00

+ c plotted against the data points from Table 6.2.

-50

0

50

100

150

Error term value

Figure 6.13 Distribution of the error terms.

The error terms ~i = yi - ji are shown in Figure 6.13.
A distribution fit of these ~i values shows that they are approximately normally distributed. A test of
significance of r also shows that, for 28 degrees of freedom (n - 2), there is only about a 5 x lo-"
chance that such a high value of r could have been observed from purely random data. We would
therefore feel confident in modelling the relationship between any net monthly income value N (between
the values 505 and 1581) and monthly expenditure on food F using

Uncertainty about least-squares regression parameten

The parameters m , c and ,S
, for the least-squares regression represent the best estimate of the variability
model where we are assuming some stochastically linear relationship between x and y. However, since

l3 6

Risk Analysis

we will have only a limited number of observations (i.e. {x,y ) pairs), we do not have perfect knowledge
of the stochastic system and there is therefore some uncertainty about the regression parameters. The
t-test tells us whether the linear relationship might exist at some level of confidence. More useful,
however, from a risk analysis perspective is that we can readily determine distributions of uncertainty
about these parameters using the bootstrap.

6.3.10 Rank order correlation coefficient
Spearman's rank order correlation coefficient p is a non-parametric statistic for quantifying the correlation relationship between two variables. Non-parametric means that the correlation statistic is not
affected by the type of mathematical relationship between the variables, unlike linear least-squares
regression analysis, for example, which requires the relationship to be described by a straight line with
normally distributed variation of the dependent variable about that line.
Calculating the rank order correlation analysis proceeds as follows. Replace the n observed values
for the two variables X and Y by their ranking: the largest value for each variable has a rank of 1, the
smallest a rank of n, or vice versa. The Excel function RANK() can do this, but it is inaccurate where
there are ties, i.e. where two or more observations have the same value. In such cases, one should assign
to each of the same-valued observations the average of the ranks they would have had if they had been
infinitesimally different from the value they take.
The Spearman rank order correlation coefficient p is calculated as

where ui and vi are the ranks of the ith pair of the X and Y variables. This is, in fact, a shortcut
formula: it is not exact when there are tied measurements, but still works well when there are not too
many ties relative to the size of n. The exact formula is

where

and where ui and vj are the ranks of the ith observation in samples 1 and 2 respectively. This calculation
does not require that one identify which variable is dependent and which is independent: the calculation
for r is symmetric, so X and Y could swap places with no effect on the value of r. The value of r
varies from -1 to 1 in the same way as the least-squares regression coefficient r. A value of r close to

138

R~skAnalysis

The mode

The mode is the x value with the greatest probability p(x) for a discrete distribution, or the greatest
probability density f (x) for a continuous distribution. The mode is not uniquely defined for a discrete
distribution with two or more values that have the equal highest probability. For example, a distribution
of the number of heads in three tosses of a coin gives equal probability
to both one and two heads.
The mode may also not be uniquely defined if a distribution is multimodal (i.e. it has two or more
peaks).

(i)

The median ~ 0 . 5

The median is the value where the variable has a 50 % probability of exceeding, i.e.

An interesting property of unimodal probability distributions relates the relative positions of the
mean, mode and median. If the distribution is right (positively) skewed, these three measures of central
tendency are positioned from left to right: mode, median and mean (see Figure 6.14). Conversely, a
unimodal left (negatively) skewed distribution has them in the reverse order. For a unimodal, symmetric
distribution, the mode, median and mean are all equal.

6.4.2 Measures of spread
Variance V

The variance is a measure of how much the distribution is spread from the mean:

where E[] denotes the expected value (mean) of whatever is in the brackets, so

-

.-*

--..-...
----

Median

(I)

c

50 ~ercentof distr~butlon

a,

u

.-#n
r
I-'

2

2

a.

I
Figure 6.14

Variable value

Relative positions of the mode, median and mean of a right-skewed unimodal distribution.

Chapter 6 Probability mathematics and simulation

139

Thus, the variance sums up the squared distance from the mean of all possible values of x, weighted
by the probability of x occurring. The variance is known as the second moment about the mean. It has
units that are the square of the units of x. So, if x is cows in a random field, V has units of cows2.
This limits the intuitive value of the variance.
Standard deviation a

a.

The standard deviation is the positive square root of the variance, i.e. a =
Thus, if the variance
has units of cows2, the standard deviation has units of cows, the same as the variable x. The standard
deviation is therefore more popularly used to express a measure of spread.

i

Example 6.8

The variance V of the Uniform(1, 3) distribution is calculated as follows:

I

V = E(x 2 ) - p2

p = 2from before

and the standard deviation a is therefore

Variance and standard deviation have the following properties, where a is some constant and X and
Xi are random variables:
1. V(X) ? O a n d a ( X ) 2 0 .
2. V(aX) = a 2V(X) and a (aX) = a a (X).
xi) = C:=,V(Xi), providing the Xi are uncorrelated.
3. V (C:='=,

6.4.3 Mean, standard deviation and the normal distribution
For a normal distribution only, the areas bounded 1, 2 and 3 standard deviations either side of the mean
contain approximately 68.27, 95.45 and 99.73 % of the distribution, as shown in Figure 6.15. Since a
lot of distributions look similar to a normal distribution under certain conditions, people often think
of 70 % of a distribution being reasonably contained within one standard deviation either side of the
mean, but this rule of thumb must be used with care. If it is applied to a distribution that is significantly
non-normal, like an exponential distribution, the error can be quite large (the range p f a contains 87 %
of an exponential distribution, for example).

140

Risk Analysis

Figure 6.15 Some probability areas of the normal distribution.

Example 6.9

Panes of bullet-proof glass manufactured at a factory have a mean thickness over a pane that is normally
distributed, with a mean of 25 rnm and a variance of 0.04 mm2. If 10 panes are purchased, what is the
probability that all the panes will have a mean thickness between 24.8 and 25.4 mm?
The distribution of the mean thickness of a randomly selected pane is Normal(25, 0.2) mm, since the
variance is the square of the standard deviation; 24.8 mm is one standard deviation below the mean,
25.4mm is two standard deviations above the mean. The probability p that a pane lies between 24.8
and 25.4mt-n is then half the probability of lying f one standard deviation from the mean plus half
the probability of lying ftwo standard deviations from the mean, i.e. p (68.27 % 95.45 %)I2 =
81.86 %. The probability that all panes will have a mean thickness between 24.8 and 25.4mm, provided
that they are independent of each other, is therefore x (81.86 %)I0 = 13.51 %. +

+

6.4.4 Measures of shape
The mean and variance are called the first moment about zero and the second moment about the mean.
The third and fourth moments about the mean, called skewness and kurtosis respectively, are also
occasionally used in risk analysis.
Skewness S

The skewness statistic is calculated from the following formulae:
Discrete variable:

Continuous variable:

- -

Chapter 6 Probability mathematics and simulation

Skewness

14 1

Kurtosis

Figure 6.16 Examples of skewness and kurtosis.

This is often called the standardised skewness, as it is divided by a 3 to give a unitless statistic. The
skewness statistic refers to the lopsidedness of the distribution (see left-hand panel of Figure 6.16). If
a distribution has a negative skewness (sometimes described as left skewed), it has a longer tail to the
left than to the right. A positively skewed distribution (right skewed) has a longer tail to the right, and
zero-skewed distributions are usually symmetric.
Kurtosis K
The kurtosis statistic is calculated from the following formulae:
Discrete variable:

Continuous variable:
max

"

(x

-

K=

p14f(x) dx

a4

This is often called the standardised kurtosis, since it is divided by a4, again to give a unitless
statistic. The kurtosis statistic refers to the peakedness of the distribution (see right-hand panel of
Figure 6.16) - the higher the kurtosis, the more peaked is the distribution. A normal distribution has a
kurtosis of 3, so kurtosis values for a distribution are often compared with 3. For example, if a distribution
has a kurtosis below 3 it is flatter than a normal distribution. Table 6.3 gives some examples of skewness
and kurtosis for common distributions.

6.4.5 Raw and central moments
There are three sets of moments that are used in probability modelling to describe a distribution of a
random variable x with density function f (x). The first set are called raw moments p i . The kth raw
moment is defined as

1
max

P; = E [xk] =

x k f(x)dx

min

Table 6.3

Skewness and kurtosis.

Distribution

Skewness

Binomial
ChiSq
Exponential
Lognormal
Normal
Poisson
Triangular
Uniform

Kurtosis

-

where k = 1 , 2 , 3 , . . . , or, for discrete variables with probability mass p(x), as
max

pi = E [xk] = x x k p ( x )
min

Then we have the central moments, mk, defined as
p k

(X - ,ulkf (x) dx,

= E [X - P ) ~ ]=

k = 2,3, . .

min

where p = pi is the mean of the distribution. Finally, we have the normalised moments:
Mean = p
Variance = p 2
Skewness =
Kurtosis =

P3

(~ariance)~/~
P4

(~ariance)~

The normalised moments are what appear most often in this book because they allow us to compare
distributions most easily. One can translate between raw and central moments as follows:
From raw moments to central moments:

From central moments to raw moments:

Chapter 6 Probability mathematics and simulation

143

You might wonder why we don't always use normalised moments and avoid any confusion. Central
moments don't actually have much use in risk analysis - they are more of an intermediary calculation
step, but raw moments are very useful. First of all the equations are simpler and therefore sometimes
easier to calculate than central moments, and we can then convert them to central moments using the
equations above. Secondly, they allow us to detennine the moments of some combinations of random
variables. For example, consider a variable Y that has probability p of taking a value from variable A
and a probability (1 - p ) of talung a value from variable B:

You may also come across something called a moment generating function. This is a function Mx(t)
specific to each distribution and defined as

where t is a dummy variable. This leads to the relationship with raw moments:

[

For example, the Normal(m, s ) distribution has Mx(t) = exp p t

+D:2]

from which we get

The great thing about moment generating functions is that we can use them with the sums of random
variables. For example, if Y = rA sB, where A and B are random variables and r and s are constants,
then

+

Note that, for a few distributions, not all moments are defined. The calculation of the moments of the
Cauchy distribution, for example, is the difference between two integrals that give infinite values. More
commonly, a few distributions don't have defined moments unless their parameters exceed a certain
value. Appendix I11 lists these distributions and the restrictions.

Chapter 7

Building and running a model
In this chapter I give a few tips on how to build a risk analysis model and techniques for making it
run faster - very useful if your model is either very large or needs to be run for many iterations. I also
explain the most common errors people make in their modelling.

7.1 Model Design and Scope
Risk analysis is about supporting decisions by answering questions about risk. We attempt to provide
qualitative and, where time and knowledge permit, quantitative information to decision-makers that is
pertinent to their questions. Inevitably, decision-makers must deal with other factors that may not be
quantified in a risk analysis, which can be frustrating for a risk analyst when they see their work being
"ignored". Don't let it frustrate you: the best risk analysts remain professionally neutral to the decisions
that are made from their work. Our job is to make sure that we have represented the current knowledge
and how that affects the variables on which decisions are made. Remaining neutral also relieves you of
being frustrated by lack of available data or adequate opinion - you just have to work with what you
have.
The first step to designing a good model is to put yourself in the position of the decision-maker
by understanding how the information you might provide connects to the questions they are asking. A
decision-maker often does not appreciate all that comes with asking a question in a certain way, and
may not initially have worked out all the possible options for handling the risk (or opportunity).
When you believe that you properly understand the risk question or questions that need(s) answering,
it is time to brainstorm with colleagues, stakeholders and the managers about how you might put
an analysis together that satisfies the managers' needs. Effort put into this stage pays back tenfold:
everyone is clear on the purpose of your analysis; the participants will be more cooperative in providing
information and estimates; and you can discuss the feasibility of any risk analysis approach. Consider
going through the quality check methods I described in Chapter 3. I recommend you think of mapping
out your ideas with Venn diagrams and event trees. Then look at the data (and perhaps expertise for
subjective estimates) you believe are available to populate the model. If there are data gaps (there usually
are), consider whether you will be able to get the necessary data to fill the gaps, and quickly enough
to be able to produce an analysis within the decision-maker's timeframe. If the answer is "no", look
for other ways to produce an analysis that will meet the decision-maker's needs, or perhaps a subset
of those needs. But, whatever you do, don't embark on a risk analysis where you know that data gaps
will remain and your decision-maker will be left with no useful support. Some scientists argue that risk
analysis can also be for research purposes - to determine where the data gaps lie. We see the value in
that determination, of course, but, if that is your purpose, state it clearly and don't leave any expectation
from the managers that will be unfulfilled.

146

Risk Analys~s

7.2 Building Models that are Easy to Check and Modify
The better a model is explained and the better it is laid out, the easier it is to check. Model building is
an iterative process, which means that you should construct your model to make it easy to add, remove
and modify elements. A few basic rules will help you do this:
Dedicate one sheet of the workbook to recording the history of changes to the model since conception,
with emphasis on changes since the previous version.
Document the model logic, data sources, etc., during the model build. It may seem tedious, especially
for the parts you end up discarding, but writing down what you do as you go along ensures the
documentation does get done (otherwise we move on to the next problem, the model remains a black
box to others, etc.) and also gives you a great self-check on your approach.
Avoid really long formulae if possible unless it is a formula you use very often. It might be rather
satisfying to condense some complex logic into a single cell, but it will be very hard for someone
else to figure out what you did.
Avoid writing macros that rely on model elements being at specific locations in the workbook or
in other files. Add plenty of annotations to macros. Don't put model parameter values in the macro
code. Give each macro and input parameter a sensible name.
Avoid being geeky - I'm reviewing a spreadsheet model right now written loyears ago by a guy
who is no longer around. It is almost completely written in macros, with almost no annotation, but
worst of all is that he wrote the model to allow it automatically to expand to accommodate more
assets, though there was no such requirement. He created dozens of macros to do simple things like
search a table that would normally be done with a VLOOKUP or OFFSET function, and placing
everything in macros linked to other macros, etc., means one cannot use Excel's audit tools like
Trace Precedents. It also takes maybe a 100 times longer to run than it should.
Break down a complex section into its constituent parts. This may best be done in a separate
area of the model and the result placed into a summary area. Hit the F9 key (or whatever will
generate another scenario) to see that the constituent parts are all working well. Often, in developing
ModelRisk functions, we have built spreadsheet models to replicate the logic and have found that
doing so can give us ideas for improvements too.
Use a single formula for an array (e.g. column) so that only one cell need be changed and the
formula copied across the rest of the array.
Keep linking between sheets to a minimum. For example, if you need to do a calculation on a dataset
residing in one sheet, do it in that sheet, then link the calculation to wherever it needs to be used. This
saves huge formulae that are difficult to follow, like: =VoseCumulA('Capital required' !G25,'Capital
required' !G26,'Capital required' !G28:G106,'Capital required' !H28:HI 06).
Create conditional formatting and alerts that tell you when impossible or irrelevant values occur in the
model. ModelRisk functions have a lot of imbedded checks so that, for example, VoseNormal(0, - 1)
will return the text "Error: sigma must be >= 0 rather than Excel's rather unhelpful #VALUE!
approach. If you write macros, include similarly meaningful error messages.
Use the DataNalidation tool in Excel to format cells so that another user cannot input inappropriate
values into the model - for example, they cannot input a non-integer value for an integer variable.

Chapter 7 Building and running a model

147

Use the Excel Tools/Protection/Protect~Sheetfunction together with the Tools/Protection/Allow~
Users-toxdit-Ranges function to ensure other users can only modify input parameters (not calculation cells).
In general, keep the number of unique formulae as small as possible - we often write columns
containing the same formulae repeatedly with just the references changing. If you do need to write
a different formula in certain cells of an array (usually the beginning or end), consider giving them
a different format (we tend to use a grey background).
Colour-code the model elements: we use blue for input data and red for outputs.
Make good use of range naming. To give a name to a cell or range of contiguous cells, select the cells,
click in the name box and type the name you want to use. So, for example, cell A1 might contain
the value 22. Giving it the label "Peter" means that typing ''=PeterV anywhere else in the sheet will
return the value 22. For a lot of probability distributions there are standard conventions for naming
the parameters of your model. For example, =VoseHypergeo(n, D, M) and VoseGamma(alpha, beta).
So, if you have just one or two of these distributions in your model, using these names (e.g. alphal,
alpha2, etc., for each gamma distribution) actually makes it easier to write the formulae too. Note
that a cell or range may have several names, and a cell in a range may have a separate name from
the range's name. Don't follow my lead here because, for the purposes of writing models you can
read in a book, I've rarely used range names.

7.3 Building Models that are Efficient
A model is most efficient when:
1.
2.
3.
4.

It
It
It
It

takes the least time to run.
takes the least effort to maintain and requires the least amount of assumptions.
has a small file size (memory and speed issues).
supports the most decision options (see Chapters 3 and 4).

7.3.1 Least time to run
Microsoft are making efforts to speed up Excel, but it has a very heavy visual interface that really
can slow things down. 1'11 look at a few tips for malung Excel run faster first, then for malung your
simulation software run faster and then for making a model that gets the answer faster. Finally, I'll
give you some ideas on how to determine whether you can stop the model because you've run enough
iterations.
Making Excel run faster

Excel scans for calculations through worksheets in alphabetical order of the worksheet name, and
starts at cell A1 in each sheet, scans the row and drops down to the next row. Then it dances around
for all the links to other cells until it finds the cells it has to calculate first. It can therefore speed
things up if you give names to each sheet that reflect their sequence (e.g. start each sheet with
"1. Assumptions", "2. Market projection", "3.. . . " etc.), and keep the calculations within a sheet
flowing down and across.

148

Risk Analysis

Avoid array functions as they are slow to calculate, although faster than an equivalent VBA function.
Use megaformulae (with the above caution) as they run about twice as fast as intermediary calculations, and 10 times as fast as VBA calculations.
Custom Excel functions run more slowly than built-in functions but speed up model building and
model reliability. Be careful with custom functions because they are hard to check through. There
are a number of vendors, particularly in the finance field, who sell function libraries.
Avoid links to external files.
Keep the simulation model in one workbook.
Making your simulation software run faster

Turn off the Update Display feature if your Monte Carlo add-in has that ability. It makes an enormous
difference if there are imbedded graphs.
Use Multiple CPUs if your simulation software offers this. It can make a big difference.
Avoid the VoseCumulA(), VoseDiscrete(), VoseDUniform(), VoseRelative() and VoseHistogram()
distributions (or other product's equivalents) with large arrays if possible, as they take much longer
to generate values than other distributions.
Latin hypercube sampling gets to the stable output quicker than Monte Carlo sampling, but the effect
gets increasingly quickly lost the more significant distributions there are in the model, particularly
if the model is not just adding andlor subtracting distributions. The sampling methods take the same
time to run, however, so it makes sense to use Latin hypercube sampling for simulation runs.
Run bootstrap analyses and Bayesian distribution calculations in a separate spreadsheet when you
are estimating uncorrelated parameters, fit the results using your simulation software's fitting tool
and, if the fit is good, use just the fitted distributions in your simulation model. This does have the
disadvantage, however, of being more laborious to maintain when more data become available.
If you write VBA macros, consider whether they need to be declared as volatile.
Getting the answer faster

As a general rule, it is much better to be able to create a probability model that calculates, rather
than simulates, the required probability or probability distribution. Calculation is preferable because the
model answer is updated immediately if a parameter value changes (rather than requiring a re-simulation
of the model), and more importantly within this context it is far more efficient.
For example, let's imagine that a machine has 2000 bolts, each of which could shear off within a
certain timeframe with a 0.02 % probability. We'll also say that, if a bolt shears off, there is a 0.3 %
probability that it will cause some serious injury. What is the probability that at least one injury will
occur within the timeframe? How many injuries could there be?
The pure simulation way would be to model the number of bolt shears
Shears = VoseBinomia1(2000,0.02 %)
and then model the number of injuries
Injuries = VoseBinomial(Shears, 0.3 %)

Chapter 7 Building and running a model

A

l

B

c

I

D

I

E

I

F

I

G

I

H

I

I

I

J

I

149

K

Probability injury

Probability of x injuries

6
Number of
injuries x
0
1
2
3
4
5

7
8

9
10

11
&

13

Excel
9.988E-01
1.199E-03
7.188E-07
2.872E-10
8.604E-14
0.000E+00

ModelRisk
9.988E-01
1.I99E-03
7.188E-07
2.872E-10
8.604E-14
2.061E-17

14
2

1
7
18
19
20
21
-

C8:C13
D8:D13
Graph

Formulae table
=BINOMDIST(B8,Bolts,Pshear'Pinjury,FALSE)
=VoseBinomialProb(B8,Bolts,Pshear*Pinjury,FALSE)
=SERIES(,Sheetl!$B$8:$B$13,Sheetl !$D$8:$D$13,1)

22

Figure 7.1

Example model determining a risk analysis outcome by calculation.

Or we could recognise that each bolt has a 0.02 %

* 0.3 % chance of causing injury, so

Injuries = VoseBinomial(Bolts, 0.02 % * 0.3 %)
Run a simulation enough iterations and the fraction of the iterations where Injuries > 0 is the required
probability, and collecting the simulated values gives us the required distribution. However, on average
we should see 2000 * 0.02 % * 0.3 % = 0.0012 injuries (that's 1 in 833), so your simulation will generate
about 830 zeros for every non-zero value; for us to get an accurate description of the result (e.g. have
1000 or so non-zero values), we would have to run the model a long time. A better approach is to
calculate the probabilities and construct the required distribution as in the model shown in Figure 7.1.
I have used Excel's BINOMDIST function to calculate the probability of each number of injuries x.
You can see the probability of non-zero values is pretty small, hence the need for the y axis in the chart
to be shown in log scale. The beauty of this method is that any change to the parameters immediately
produces a new output. I have also shown the same calculation with ModelRisk7s VoseBinomialProb
function, which does the same thing because the probability that x = 5 is not actually zero (obviously)
as BINOMDIST would have us believe - Excel's statistical functions aren't very good.
Of course, most of the risk analysis problems we face are not as simple as the example above, but
we can nonetheless often find shortcuts. For example, imagine that we believe that the maximum daily
wave height around a particular offshore rig follows a Rayleigh(7.45) metres. The deck height (distance
from water at rest to underside of lower deck structure) is 32 metres, and the damage that will be caused
as a fraction f of the value of the rig is a function of the wave height x above the deck level following
the equation

ISO

Risk Analysis

A

1

B

1

-

2

Day 364

C
Deck height
Rayleigh parameter

D

I

E
32 metres
7.45

F

I

G

I

H

1 1 ,

Loss (fraction)
Max wave height (m)
1.598661825
12.34919201
6.245851047
19.18746778

b) Simulation and calculation
P(wave > deck)
Size of wave given > Deck
Resultant damage (fractions)
Expected damage over year (mean=output)

9.85635E-05
37.76709587
0.800691151
0.028805402

c) Calculation only
Expected fractional loss per day
Expected fractional loss over the year (output)

0.0000471
0.017205699
Formulae table
a) Pure simulation

;1:
390

=VoseRayleigh($D$2)
=IF(CG>$D$l ,(l+((C6-$D$1)/1.6)"0.91)"-0.82,O)
olp=mean = s u M ( D ~ : D ~ ~ o )
b) Simulation and calculation
=I-VoseRayleighProb($D$l ,$D$2,1)
D375
=VoseRayleigh($D$2,,VoseXBounds($D$I,))

z;

.

olp=rnean = 3 6 5 * ~ 3 7 4 * ~ 3 7 6

D381 olp

.

c) Calculation only
=Voselntegrate("VoseRayleighProb(#,D2,0)*(1
+((#-D1)Il.6)A-0.91)A-0.82",D1
,200.10)
=D380*365

Figure 7.2 Offshore platform damage model showing three methods to estimate expected damage as a
fraction of rig value.

We would like to know the expected damage cost per year as a fraction of the rig value (this is a
typical question, among others, that insurers need answered).
We could determine this by (a) pure simulation, (b) a combination of calculation and simulation or
(c) pure calculation as shown in the model of Figure 7.2.
The simulation model is simple enough: the maximum wave height is simulated for each and then
the resultant damage is simulated by writing an IF statement for when the wave height exceeds the deck
height. The model has the advantage of being easy to follow, but the probability of damage is low, so
it needs to run a long time. You also need an accurate algorithm for simulating a Rayleigh distribution.
The simulation and calculation model calculates the probability that a wave will exceed the deck
height in cell D374 (about one in 10 000). ModelRisk has equivalent probability functions for all its
distributions, whereas other Monte Carlo add-ins tend to focus only on generating random numbers, but
Appendix I11 gives the relevant formulae so you can replicate this. Cell D375 generates a Rayleigh(7.45)
distribution truncated to have a minimum equal to the deck height, i.e. we are only simulating those
waves that would cause any damage. I've used the ModelRisk generating function but @RISK, Crystal
Ball and some other simulation tools offer distribution truncation. Cell D376 then calculates the damage
fraction for the generated wave height. Finally, cell D377 multiplies the probability that a wave will

Chapter 7 Build~ngand running a model

15 l

exceed the deck height by the damage it would then do and 365 for the days in the year. Running a
simulation and taking the mean (=RiskMean(D377) in @RISK, =CB.GetForeStatFN(D377,2) in Crystal
Ball) will give us the required answer. This version of the model is still pretty easy to understand but
has 11365 of the simulation load and only simulates the 1 in 10 000 scenario where a wave hits the
deck, so it achieves the same accuracy for about 113 650 000th of the iterations as the first model.
The third model performs the integral

in Cell D380 where f (x) is the Rayleigh(7.45) density function and D is the deck height. This is
summing up the damage fraction for each possible wave height x weighted by x's probability of occurrence. The VoseIntegrate function in ModelRisk performs one-dimensional integration on the variable
"#" using a sophisticated error minimisation algorithm that gives very accurate answers with short
computation time (it took about 0.01 seconds in this model, for example). Mathematical software like
Mathematica and Maple will also perform such integrals. The advantage of this approach is that the
results are instantaneous and very accurate (to 15 significant figures!), but the downside is that you need
to know what you are doing in probability modelling (plus you need a fancier tool such as ModelRisk,
Maple, etc). ModelRisk helps out with the explanation and checking by displaying a plot of the function
and the integrated area when you click the Vf (View Function) icon. Note that for numerical integration
you have to pick a high value for the upper integration limit in place of infinity, but a quick look at the
Rayleigh(7.45) shows that its probability of being above 200 is so small that it's outside a computer's
floating point ability to display it anyway.
In summary, calculation is fast and more accurate (true, with simulation you can improve accuracy
by running the model longer, but there's a limit) and simulation is slow. On the other hand, simulation
is easier to understand and check than calculation. I often use the phrase "calculate when you can,
simulate when you can't", and when you "can't" is as much a function of the expertise level of the
reviewers as it is of the modeller. If you really would like to use a calculation method, or want to have
a mixed calculation-simulation model, but wony about getting it right, consider writing both versions
in parallel and checking they produce the same answers for a range of different parameter values.

7.3.2 Least effort to maintain
The biggest problem in maintaining a spreadsheet model is usually updating data, so make sure that
you keep the data in predictable areas (colour-coding the tabs of each sheet is a nice way). Also, avoid
Excel's data analysis features that dump the results of a data analysis as fixed values into a sheet. I
think this is dreadful programming. Software like @RISK and Crystal Ball, which fit distributions to
data, can be "hot-linked" to a dataset, which is a much better method than just exporting the fitted
parameters if you think the dataset may be altered at some point. ModelRisk has a huge range of
"hot-linking7' fit functions that will return fitted parameters or random numbers for copulas, time series
and distributions. You can sometimes replicate the same idea quite easily. For example, to fit a normal
distribution one need only determine the mean and standard deviation of the dataset if the data are
random samples, so using Excel's AVERAGE and STDEV functions on the dataset will a tomatically
update a distribution fit. Sometimes you need to run Solver, e.g. to use maximum likeliho methods
to fit to a gamma distribution, so make a macro with a button that will perform that operation (see, for
example, Figure 7.3).

'b

Figure 7.3 Spreadsheet with automation to run Solver.

The button runs the following macro which asks the user for the data array, runs Solver by creating a
temporary file with the likelihood calculation and finally asks the user where to place the results (cells
D3:E3 in this case):
Private Sub CommandButtonl-Click0
On Error Resume Next
Dim DataRange As Excel.Range
Dim n As Long, Mean As Double, Var As Double

- _ - _ _ - - - - _ - _ -Selecting
_input data --------------1 Set DataRange = Application.InputBox("Select one-dimensional input data array'', "Data",
Selection.Address, , , , , 8 )
If DataRange Is Nothing Then Exit Sub
n = DataRange.Cel1s.Count
8

_ - - _ - - - - _ _ - - _ _ _Error
- - _ messages -----------------If n < 2 Then MsgBox "Please enter at least two data values": G O T ~1
If DataRange.Columns.Count > 1 And DataRange.Rows.Count > 1 Then MsgBox "Selected data is
not one-dimensional":GoTo 1
If Application.WorksheetFunction.Min(DataRange.Value)
<= 0 Then MsgBox "Input data must
be non-negative": GoTo I
f

Sheets.Add Sheets(1) ' adding a temporary sheet
_ - - pasting input data into the temporary sheet ---If DataRange.Co1umns.Count > 1 Then
Sheets(l).Range("Al:AU& n).Value =
Application.WorksheetFunction.Transpose(DataRange.Value)
Else
Sheets(1) .Range("Al:AU& n).Value = DataRange.Value
End If
f

Chapter 7 Build~ngand running a model

153

Mean = Application.WorksheetFunction.Average(Sheets(1).Range("Al:Aq'
& n)) ' calculating
mean of data
& n)) ' calculating
Var = Application.~orksheet~unction.Var(Sheets(1).Range("Al:A"
variance of data
Alpha = Mean " 2 / Var ' Best guess estimate for Alpha
Beta = Var / Mean ' Best guess estimate for Beta
- - - - - - setting initial values for the Solver - - - - - - Sheets(l).Range("DlW)
.Value = Alpha
Sheets(l).Range("El")
.Value = Beta
9

v
- - - - - - - - setting the LogLikelihood function - - - - - - - Sheets(1).Range("B1:BW& n).Formula = "=LOGlO(GAMMADIST(Al,$D$1,$E$l,O))"

- - - - - - - - _ - setting the objective function - - - - - - - - - Sheets(1).Range("GlU)
.Formula = "=SUM(B1:Bn& n & " ) "
I

I

- - - - - - - - - - - - - - _ Launching

the Solver - - - - - - - - - - - - - - -

SOLVER.SolverReset
SOLVER.SolverOk SetCell:=Sheets(l).Range("GlW),MaxMinVal:=l,
ByChange:=Sheets(l).Range("Dl:El")
SolverAdd CellRef:="$D$l",Relation:=3, FormulaText:="O.000000000000001"
SolverAdd CellRef:="$E$l",Relation:=3, FormulaText:="O.000000000000001"
S0LVER.SolverSolve UserFinish:=True
S0LVER.SolverFinish KeepFinal:=l

_ _ _ - - - - - - - - -Remembering
output values -----------Alpha = Sheets(l).Range("Dl").Value
Beta = Sheets(l).Ranqe("El").Value
I

I

_ - _ _ - - - - - - -Deleting

the temporary sheet - - - - - - - - - - -

Application.DisplayA1erts = False

Sheets (1).Delete
Application.DisplayAlerts = True
- - - - - - - - - - - - *electing
output location - - - - - - - - - - - 2 Set ~ata~ange.= Application.InputBox("Select 2x1 output location", "Output",
Selection.Address, , , , , 8 )
If DataRange Is Nothing Then Exit Sub
n = DataRange.Cells.Count
If n < 2 Then MsgBox "Enter at least two data values": GoTo 2
1

- - - - Pasting outputs into the selected range - - - - - - DataRange.Cells(1, 1) = Alpha
If DataRange.Co1umns.Count = 2 Then DataRange(1, 2) = Beta Else DataRange(2, 1)
t

=

Beta

End Sub

A minimum limit is placed on alpha and beta 0.000000000000001 to avoid errors and LOGlO(. . .) is
used around the GAMMADIST(. . .) fun1ctions because a LogLikelihood will behave less dramatically
\

,

I

154

Risk Analysis

and let Solver find the solution more reliably. The moments-based estimate for alpha (=DataMeanA21
Datavariance) and beta (=DataVariance/DataMean) are used as starter values for Solver so it will find
the answer more quickly. If a user needs to perform some operations prior to running a model, then
write a description of what needs doing and why. These days, we attach a help file with the model, and
this allows us to imbed little videos which is very helpful, but at the least try to imbed or couple the
model to a pdf file with screen captures of each step.
In my experience, the other main reason a model can be hard to maintain is that it is complex and
uses many different sources of data that go out of date. When you plan out a risk analysis (Chapters 3
and 4) for a model that will be used periodically, or that could take a long time to complete, consider
whether there is a simpler model that will give answers that are pretty close in decision terms to the
more complex model being planned. If the difference in accuracy is small, it may be balanced by the
greater applicability that comes with updating the inputs more frequently.

7.3.3 Smallest file size
Megaformulae reduce the file size considerably
Maintaining large datasets in your model will increase the file size. It is better to do the analysis
outside the spreadsheet and copy across the results.
Sometimes large datasets or calculation arrays are used to construct distributions (e.g. fitting first- or
second-order non-parametric distributions to data, constructing Bayesian posterior distributions and
bootstrap analysis). Replacing these calculations with a fitted distribution can have a marked effect
on model size and speed.
ModelRisk has been designed to maximise speed and rninimise memory requirements. It has a large
number of functions that will perform complex calculations in a single cell or small array. You might
also be able to achieve some of the same effect in your models with VBA code, particularly if you
need to perform iterative loops.

7.3.4 How many iterations of a model to run
You will often see risk analysis reports, or papers in journals, that show the results and tell you that this
was based on 10 000 (or whatever) Latin hypercube (or whatever) iterations of the model. I suppose
that may sometimes be useful to know, but not often. The author is usually trying to communicate that
the model was run long enough for the results to be stable. The problem is that, for one model trying
to determine a mean, 500 iterations may be good enough; for another trying to determine a 99.9th
percentile, 100 000 iterations might be needed. It also depends on how sensitive the decision question
is to the output's accuracy. A frequent question that pops up in our courses is "how many iterations do
I need to run", and you can see there is no absolute answer to that. A short answer, burdened with many
caveats, is "no less than 3 0 0 if you are interested in the entire output distribution. At 300 iterations you
start to get a reasonably well-defined cumulative distribution, so you can approximately read off the 50th
and 85th percentiles, for example, and the mean is pretty well determined for most output distributions.
At the same time, if you export the generated values from two or more random variables in your model
to produce scatter plots, 300 is really the minimum you need to get any sense of the patterns that they
produce (i.e. their joint distribution). We usually have our models set to run 3000 iterations as a default
(but obviously increase that figure if a particularly high level of accuracy is warranted), because we plot
a great deal of scatter plots from generated data, and this is about the right number of points before

3 000 iterations

300 iterations
1
0.9
0.8

0.8 .

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0

10

20

30

40

50

60

70

80

90

100

0

10

20

30

40

50

60

70

80

90

100

Figure 7.4 Comparison of cumulative distribution plots for 20 model runs each of 3000 and 300 Monte
Carlo iterations for a well-behaved output (i.e. a nice smooth curve).

the scatter plot gets clogged up, and certainly enough for all the percentiles and statistics to be well
specified.
Figure 7.4 shows what type of variation you would typically get for a cumulative distribution between
runs of 300 iterations and of 3000 iterations. Since most models include an element of guesswork in
the choice of model, distributions or parameter values to use, one should not usually be too concerned
about exact precision in the Monte Carlo results, but you'll see that 300 iterations is probably the least
level of accuracy you might find acceptable.
Figure 7.5 shows the same input and output plotted together as a scatter plot for 300 and 3000
iterations. We find scatter plots to be a great, intuitive presentation of how, among others, the input
variability influences the output value. You'll see that the pattern is just about visible for 300 iterations,
and just starting to get clogged up at 3000 iterations (of course, if you run more than 3000 iterations,
you can plot a sample of just 3000 of them to keep the scatter plot clear). If the pattern were simpler,
the left-hand pane1 of 300 iterations would of course be clearer.
In general, you'll have two opposing pressures:
Too few iterations and you get inaccurate outputs, graphs (particularly histogram plots) that look
"scruffy".
Too many iterations and it takes a long time to simulate, and it may take even longer to plot graphs,
export and analyse data, etc., afterwards. Export the data into Excel and you may also come upon
row limitations, and limitations on the number of points that can plotted in a chart.
There will usually be one or more statistics in which you are interested from your model outputs, so
it would be quite natural to wish to have sufficient iterations to ensure a certain level of accuracy.
Typically, that accuracy can be described in the following way: "I need the statistic Z to be accurate to
within fd with confidence a".
I will show you how you can determine the number of iterations you need to run to get some specified
level of accuracy for the most common statistics: the mean and cumulative probabilities. The example
models let you monitor the level of accuracy in real time. Note that all models assume that you are using
Monte Carlo sampling. This will therefore somewhat overestimate the number of iterations you'll need
if you are using Latin hypercube sampling (which we recommend, in general). That said, in practice,

lc

1

300 iterations

lnput parameter value
3000 iterations

lnput parameter value

Figure 7.5 Comparison of scatter plots for model runs of 3000 and 300 Monte Carlo iterations.

Latin hypercube sampling will only offer useful improvement when a model is linear, or when there
are very few distributions in the model.
Iterations to run to get suficient accuracy for the mean

Monte Carlo simulation estimates the true mean p of the output distribution by summing all of the
generated values xi and dividing by the number of iterations n:

If Monte Carlo sampling is used, each xi is an iid (independent identically distributed random variable).
Central limit theorem then says that the distribution of the estimate of the true mean is (asymptotically)

Chapter 7 Building and running a model

-3-

u

J;;
Figure 7.6

-2-

u

.J;;

u
--

!J

+-a

.J;;

J;;

+2-

u

+3-

157

u

J;;

Cumulative distribution plot for the normal distribution of Equation (7.1).

given by

P = Normal

(

3

p, -

where a is the true standard deviation of the model's output.
Using a statistical principle called the pivotal method, we can rearrange this equation to make it an
equation for p:
p = Normal

(P,

5)

Figure 7.6 shows the cumulative form of the normal distribution for Equation (7.1). Specifying the
level of confidence we require for our mean estimate translates into a relationship between 6, a and n.
More formally, this relationship is

where @-' (.) is the inverse of the normal cumulative distribution function. Rearranging Equation (7.2)
and recognising that we want to have at least this accuracy gives a minimum value for n:

We have one problem left: we don't know the true output standard deviation a. It turns out that
we can estimate this perfectly well for our purposes by taking the standard deviation of the first few

158

Risk Analysis

A
1
2
-

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-

~

B

I

C

I

D

l

Output from model:
7.466257485
Required level of accuracy delta about mean: +I-

E

l

F

l

G

l

H

l

90%

1

1
0.01

with confidence alpha

I

IJ

Calculation with Crystal Ball
Standard deviation of olp iterations so far:
3.921402 (our estimate of sigma)
416042
Iterations left to do:

1

Calculation with @RISK
Standard deviation of olp iterations so far:
3.921402 (our estimate of sigma)
Iterations left to do:
416042

1

Formulae table
Crystal Ball
E6
D7
E l0
Dl1

=CB,GetForeStatFN(D2,5)
= I F ( ( E ~ * N O R M S ~ N V ( ( ~ + H ~ ) I ~ ) ~ E ~ ) " ~ - C B . ~ROUNDUP((E6*NORMSINV((1+H3)/2)/E3)"2~~~~~~O~SFN()O,
CB.lterationsFN(),O),"Sufficient accuracy achieved")
@RISK
=RiskStdDev(D2)
=IF((El O*NORMSlNV((l +H3)/2)/E3)"2-RiskCurrentlter()>O,
ROUNDUP((E1O*NORMSINV((I+H3)/2)IE3)~2-RiskCurrentlter(),0),"Sufficient
accuracy achieved")

22

Figure 7.7 Models in QRlSK and Crystal Ball to monitor whether the simulation mean has reached a
required accuracy.

(say 50) iterations. The model in Figure 7.7 shows how you can do this continuously, using Excel's
function NORMSINV to return values for @ - I ( . ) .
If you name cell D7 or Dl1 as an output, together with any other model outputs you are actually
interested in, and select the "Pause on Error in Outputs" option in your host Monte Carlo add-in, it
will automatically stop simulating when the required accuracy is achieved because the cell returns the
"Sufficient accuracy achieved text instead of a number.
lterations to run to get suficient accuracy for the cumulative probability F(x) associated with a particular value x

Percentiles closer to the 50th percentile of an output distribution will reach a stable value relatively far
quicker than percentiles towards the tails. On the other hand, we are often most interested in what is
going on in the tails because that is where the risks and opportunities lie. For example, Base1 I1 and
credit rating agencies often require that the 99.9th percentile or greater be accurately determined. The
following technique shows you how you can ensure that you have the required level of accuracy for
the percentile associated with a particular value.
Your Monte Carlo add-in will estimate the cumulative percentile F(x) of the output distribution
associated with a value x by determining what fraction of the iterations fell at or below x. Imagine
that x is actually the 80th percentile of the true output distribution. Then, for Monte Carlo simulation,
the generated value in each iteration independently has an 80 % probability of falling below x : it is a
binomial process with probability p = 80 %. Thus, if so far we have had n iterations and s have fallen
at or below x, the distribution Beta(s 1, n - s 1) describes the uncertainty associated with the true
cumulative percentile we should associate with x (see Section 8.2.3).
When we are estimating the percentile close to the median of the distribution, or when we are performing a large number of iterations, s and n will both be large, and we can use a normal approximation

+

+

Chapter 7 Building and running a model

to the beta distribution:
Beta(s

+ 1, n - s + 1) x Normal

159

i-m
P,

where S = is the best-guess estimate for F ( x ) . Thus, we can produce a relationship similar to that
in Equation (7.2) for determining the number of iterations to get the required precision for the output
mean:

Rearranging Equation (7.4) and recognising that we want to have at least this accuracy gives a
minimum value for n:

A model can now be written in a very similar fashion to Figure 7.7

7.4 Most Common Modelling Errors
This section describes, and provides examples for, the three most common mistakes we come across
in auditing risk models, even at the more elementary level. These mistakes probably constitute around
90 % of the errors we see. I strongly recommend studying them, and going through the examples
thoroughly:
Common error 1. Calculating means instead of simulating scenarios.
Common error 2. Representing an uncertain variable more than once in a model.
Common error 3. Manipulating probability distributions as if they were fixed numbers.
Common error I : calculating means instead of simulating scenarios

When we first start thinking about risk, it is quite natural to want to convert the impact of a risk to
a single number. For example, we might consider that there is a 20% chance of losing a contract,
which would result in a loss of income of $100 000. Put together, a person might reason that to be a
risk of some $20 000 (i.e. 20 % * $100 000). This $20 000 figure is known as the "expected value" of
the variable. It is the probability weighted average of all possible outcomes. So, the two outcomes are
$100 000 with 20 % probability and $0 with 80 % probability:
Mean risk(expected value) = 0.2 * $100 000
E

i

+ 0.8 * $0 = $20 000

Calculating the expected values of risks might also seem a reasonable and simple method for comparing risks. For example, in Table 7.1, risks A to J are ranked in descending order of expected cost:

160

R~skAnalysis

Table 7.1 A list of probabilities and impacts for 10 risks.
Impact if occurs Expected impact
Risk Probability
$000
$000
0.25
A
400
100

I Total expected impact

367

If a loss of $500 000 or more would ruin your company, you may well rank the risks differently:
risks C, D, I and, to a lesser extent, J pose a survival threat on your company. Note also that you may
value the impact of risk D as no more severe than that of risk C because, if either of them occur, your
company has gone bust.
On the other hand, if risk A occurs, giving you a loss of $400k, you are precariously close to ruin: it
would just take any of the risks except F and H to occur (unless they both occurred) and you've gone
bust. Looking at the sum of the expected values gives you no appreciation of how close you are to ruin.
Figure 7.8 plots the distribution of possible outcomes for this set of risks.

I

200
O0

Figure 7.8

400

600

800
1000
1200
Total risk impact $000

Probability distribution of total impact from risks A to J.

Chapter 7 Building and running a model

16 1

From a risk analysis point of view, by representing the impact of a risk by its expected value, we have
removed the uncertainty (i.e. we can't see the breadth of different outcomes), which is a fundamental
reason for doing risk analysis in the first place. That said, you might think that people running Monte
Carlo simulations would be more attuned to describing risks with distributions rather than single values,
but this is nonetheless one of the most common errors.
Another, slightly more disguised example of the same error is when the impact is uncertain. For
example, let's imagine that there will be an election this year and that two parties are running: the
Socialist Democrats Party and the Democratic Socialists Party. The SDP are currently in power and
have vowed to keep the corporate tax rate at 17 % if they win the election. Political analysts reckon
they have about a 65 % chance of staying in power. The DSP promise to lower the corporate tax rate
by 1-4 %, most probably 3 %. We might chose to express next year's corporate tax rate as
Rate = 0.35 * VosePERT(13 %, 14 %, 16 %)

+ 0.65 * 17 %

Checking the formula by simulating, we'd get a probability distribution that could give us some
comfort that we've assigned uncertainty properly to this parameter. However, a correct model would
have drawn a value of 17 % with a probability of 0.65 and a random value from the PERT distribution
with a probability of 0.35.
Common error 2: representing an uncertain variable more than once in a model

When we develop a large spreadsheet model, perhaps with several linked sheets in the same file, it is
often convenient to have some parameter values that are used in several sheets appearing in each of
those sheets. This makes it quicker to write formulae and trace back precedents in a formula. Even
in a deterministic model (i.e. a model where there are only best-guess values, not distributions) it is
important that there is only one place in the model where the parameter value can be changed (at Vose
Consulting we use the convention that all changeable input parameter values or distributions are labelled
blue). There are two reasons: firstly it is easier to update the model with new parameter values, and
secondly it avoids the potential mistake of only changing the parameter values in some of the cells in
which it appears, forgetting the others, and thereby having a model that is internally inconsistent. For
example, a model could have a parameter "Cargo (mt)" in sheet 1 with a value of 10 000 and a value
of 12 000 in sheet 2.
It becomes even more important to maintain this discipline when we create a Monte Carlo model if
that parameter is modelled with a distribution. Although each cell in the model might carry the same
probability distribution, left unchecked each distribution will generate different values for the parameter
in the same iteration, thus rendering the generated scenario impossible.
If it really is important to you to have the probability distribution formula in each cell where the
parameter is featured (perhaps because you wish to see what distribution equation was used without
having to switch to the source sheet), you can make use of the U-parameter in ModelRisk's simulation
functions to ensure that the same value is being generated in each place:
Cell A1 := VoseNormal(100, 10, Randoml)
Cell A2 := VoseNomal(100, 10, Randoml)
where Randoml is a Uniform(0, 1) distribution placed somewhere in the model. You can achieve the
same thing using a 100% rank order correlation in @RISK or Crystal Ball, for example, but this will

162

Risk Analysis

only work when the simulation is running because rank order correlation generates a set of values before
a simulation run and orders them; when you look at the model stepping through some scenarios, they
won't match.
The error described so far is where the formula for the distribution of a random variable is featured
in more than one cell of a spreadsheet model. These errors are quite easy to spot. Another form of the
same error is where two or more distributions incorporate the same random variable in some way. For
example, consider the following problem.
A company is considering restructuring its operations with the inevitable layoffs, and wishes to analyse
how much it would save in the process. Looking at just the office space component, a consultant estimates
that, if the company were to make the maximum number of redundancies and outsource some of its
operations, it would save $PERT(1.1, 1.3, 1.6)M of office space costs. On the other hand, just making
the redundancies in the accounting section and outsourcing that activity, it could save $PERT(0.4, 0.5,
0.9)M of office space costs.
It would be quite natural, at first sight, to put these two distributions into a model and run a simulation
to determine the savings for the two redundancy options. On their own, each cost saving distribution
would be valid. We might also decide to calculate in a spreadsheet cell the difference between the
two savings, and here we would potentially be making a big mistake. Why? Well, what if there is an
uncertain component that is common to both office cost savings? For example, what if inside these cost
distributions there is the cost of getting out of a current lease contract, uncertain because negotiations
would need to take place. The problem is that, by sampling from these two distributions independently,
we are not recognising the common element, which is a problem if that common element is not a fixed
value, because it induces some level of correlation.
The takeaway message from this example is: consider whether two or more uncertain parameters
in your models share in some way a common element. If they do, you will need to separate out that
common element and thereby allow it to appear just once in your model.
Common error 3: manipulating probability distributions like we do with fixed numben

At school we learn things like

Later, when we take algebra, we learn
A

+ B = C therefore C - A = B

D

* E = F therefore F I D = E

The problem is that these trusted rules do not apply so universally when manipulating random variables. This section explains how and when these simple algebraic rules no longer work, and shows you
how to identify them in your model and how to make the appropriate corrections.
An example

Most deterministic spreadsheet models consist of linked formulae that contain nothing more complicated
-, * and 1. When we decide to start adding uncertainty to the values of
than simple operations like
the components in the model, it seems natural enough simply to replace a fixed value with a probability

+,

Chapter 7 Bu~ld~ng
and runnlng a model

163

distribution describing our uncertainty. So, for example, the simple model for a company offering some
credit service:
Money borrowed by a client M :
Number of clients n :
Interest rate per annum r :
Yearly revenue:

€10 000
6500
7.5 %
M * n * r = €4 875 000

The model can now be "risked:
Money borrowed by a client M :
Number of clients n :
Interest rate per annum r :
Yearly revenue:

Lognormal(€lO 000, €4000)
PERT(6638, 6500, 8200)
7.5 %
M*n*r

The best-guess estimates of the money borrowed by a client and for the number of clients have
been replaced by distributions, but the model is otherwise unchanged. This model is probably very
wrong. The error is most easily seen by watching random values being generated on screen. Look at
the values that are being used for the entire client base and compare with where these values sit on the
Lognormal(l0 000, 4000) distribution.
For example, the Lognormal(10 000, 4000) distribution has 10 % of its probability below £5 670.
Thus, in 10 % of its iterations it will generate a value below this figure, and that value will be used for
all customers. The lognormal distribution undoubtedly reflects the variability that is expected between
customers (perhaps, for example, it was fit to a relevant dataset of amounts individual customers have
previously borrowed). The probability that two randomly selected customers will borrow less than
£5 670 is 10 % * 10 % = 1 %. The probability that all (say) 6500 customers borrow less than £5 670 if
i.e. effectively impossible, yet our model gives it
the amounts they borrow are independent is
a 10 % probability.
In order to model this problem correctly, we need to consider what are the sources of uncertainty
about the amount a customer borrowed. If the source is specific to each individual client, then the
amounts can be considered independent and the techniques of Chapter 11 should be applied. If there
is some systematic influence (like the state of the economy, recent bad press for companies offering
credit, etc.), it will have to be separated out from the individual, independent component.
Let's look at another example. The sum of two independent Uniform(0, 1) distributions is . . . what
do you think? The answer often surprises people. It is hard to imagine a simpler problem, yet when we
canvass a class we get quite a range of answers. Perhaps a Uniform(0, 2)? That's the most common
response. Or something looking a little normal? The answer is a Triangle(0, 1, 2), so we could write
U(0, 1)

+ U(0, 1) = T(0, l , 2 )

The first message in this example is that it is difficult for a person not very well versed in risk analysis
modelling to be able to predict well the results of even the most trivial model. Of course, that makes it
very hard to check the model and be comfortable about its results.
On to the next question we often pose our class:
T(O,l, 2) - U ( 0 , l ) =?

164

Risk Analysis

Figure 7.9 A plot of random samples of C against A, where A = U(0, I), B = U(0, 1) and C = A

+ B.

Now wise to the trickiness of the question, most class participants are pretty sure that their first guess
(i.e. = U(0, 1)) is wrong but don't have anything else to suggest. The answer is a symmetric distribution
that looks a little normal, stretching from -1 to 2 with peak at 0.5. But why isn't it U(0, l)? An easy
way to visualise this is to run a simulation adding two Uniform(0, 1) distributions and plotting the
generated values from one uniform distribution together with the calculated sum of them both. You get
a scatter plot that looks like that in Figure 7.9.
The line y = x shows the lowest value for the triangular distribution C for any given value of the
uniform distribution A, and the line y = 1 x is the highest value, which makes intuitive sense. The
uniform vertical distribution of points between these two lines is the effect of the second Uniform(0, 1)
distribution B. Also note that all the generated values lie uniformly (but randomly) between these two
lines. This actually is quite helpful in visualising why the sum of two Uniform(0, 1) distributions is
a Triangle(0, 1, 2) by projecting all the dots onto the y axis. Can you extend this graph to work out
graphically what U(0, 1) U(O,3) would look like?
The point of the graph is to show you that there is a strong dependency pattern between these two
distributions (a uniform and the triangle sum), which would need to be taken into account if one wished
to extract back out the two uniform distributions from each other. For example, the formulae below do
just that:

+

+

B := VoseUniform(IF(A < 1,O, A - l ) , IF(A > 1, 1, A))

Chapter 7 Building and running a model

165

Try to follow the logic for the formula for B from the graph. B will generate a Uniform(0, 1)
distribution with the right dependency relationship with A to leave C a Uniform(0, 1) distribution too.
To recap, the problem is that we have three variables linked together as follows:

We know the distributions for A and C. How do we find B and how do we simulate A , B and C all
together? The simple example above using two uniform distributions allows us to simulate A , B and C
all together, but only because we assumed A and B were independent and the problem was very simple.
In general, we cannot correctly determine B, so we need either to construct a model that avoids having
to perform such a calculation or admit that we have insufficient information to specify B.

Chapter 8

Some bas~crandom processes
8.1 Introduction
If you want to get the most out of the risk analysis and statistical modelling tools that are available,
you really need to understand the conceptual thinking behind random processes and the equations and
distributions that result, and be able to identify where these random processes occur in the real world.
In this chapter we look at the binomial, Poisson and hypergeometric processes first because they share a
common basis, and a very great deal of risk analysis problems can be tackled with a good knowledge of
just these three processes. I've added the central limit theorem here too because it explains a lot about
the behaviour of distributions. We'll look at the theory and assumptions behind each process, and the
distributions that are used in their modelling. This approach provides us with an excellent opportunity to
become very familiar with a number of important distributions, and to see the relationships between them,
even between the distributions of the different random processes. Then we'll look at some extensions
to these processes that greatly increase their range of applications. Finally, we look at a number of
problems.
There are a number of other random processes discussed in this book relating to the sums of random variables (Chapter 1l), time series modelling (Chapter 12) and correlated variables (Chapter 13).
Chapter 9 on statistics relies heavily on an understanding of the random processes described here.

8.2 The Binomial Process
A binomial process is a random counting system where there are n independent identical trials, each one
of which has the same probability of success p, which produces s successes from those n trials (where
0 5 s 5 n and n > 0 obviously). There are thus three quantities {n, p, s) that between them completely
describe a binomial process. Associated with each of these three quantities are three distributions that
describe the uncertainty about or variability of these quantities. The three distributions require knowledge
of two quantities in order to use these distributions to estimate the third.
The simplest example of a binomial process is the toss of a coin. If we define "heads" as a success,
each toss has the same probability of success p (0.5 for a fair coin). Then, for a given number of trials n
(tosses of a coin), the number of successes will be s (the number of "heads"). Each trial can be thought
of as a random variable that returns either a 1 with probability p or a 0 with probability (1 - p). Such
a trial is often known as a Bernoulli trial, and the probability (1 - p ) is often given the label q .

8.2.1 Number of successes in n trials
We start our exploration of the binomial process by looking at the probability of a certain number of
successes s for a given number of trials n and probability of success p. Imagine we have one toss of

168

Risk Analysis

Head
P.P

3

Head
P
Tail
~(1-P)

1-P
a) One toss

1

Tail
(l-p)(l-p)

1

1

b) Two tosses

Figure 8.1 Event trees for the tossing of (a) one coin and (b) two coins.

a coin. The two outcomes are "heads" (H) with probability p and "tails" (T) with probability (1 - p),
as shown in the event tree of Figure 8.l(a). If we have two tosses of a coin there are four possible
outcomes, as shown in Figure 8.l(b), namely HH, HT, TH and TT, where HT means "heads" followed
by "tails", etc. These outcomes have probabilities p2, p ( l - p), (1 - p ) p and (1 - p)' respectively. If
we are tossing a fair coin (i.e. p = 0.5),then each of the four outcomes has the same probability of 0.25.
Now, the binomial process considers each success to be identical and therefore does not differentiate
between the two events HT and TH: they are both just one success in two trials. The probability of
one success in two trials is then just 2p(l - p) or, for a fair coin, = 0.5. The two in this equation
is the number of different paths that result in one success in two trials. Now imagine that we toss a
coin 3 times. The eight outcomes are: HHH, HHT, HTH, HTT, THH, THT, TTH and TTT. Thus, one
event produces three "heads", three events produce two "heads", three events produce one "head and
one event produces no "heads" for three coin tosses. In general, the number of ways that we can get s
successes from n trials can be calculated directly using the binomial coefficient .C,, which is given by

We can check this is right by choosing n = 3 (remembering that O! = I), then

3!

($=mi!(;)
(:) 2r(l)l
3!

-1

-3

=

3!

=

3!

=3
-1

Chapter 8 Some bas~crandom processes

169

which match the number of combinations we have already calculated. Each of the ways of getting s
successes in n trials has the same probability, namely pS(l so the probability of observing x
successes in n trial is given by
pe,n(x) =

(;)

p X ( l - PI"-'

which is the probability mass function of the Binomial(n, p) distribution. In other words, the number of
successes s one will observe in n trials, where each trial has the same probability of success, is given by
s = Binomial(n, p)
Figure 8.2 shows this distribution for four different combinations of n and p . The binomial distribution
was first derived by Bernoulli (1713).

8.2.2 Number of trials needed to achieve s successes
We have seen how the binomial distribution allows us to model the number of successes that will occur
in n trials where we know the probability of success p . Sometimes, we know how many successes
we wish to have, we know the probability p and we would like to know the number of trials that we
will have to complete in order to achieve the s successes, assuming we stop once the sth success has

Figure 8.2

Examples of the binomial distribution.

170

Risk Analys~s

occurred. In this case, n is the random variable. Now that we have the binomial distribution, we can
readily determine the distribution for n. Let x be the total number of failures. The total number of trials
we will execute is then (s x), and by the (s x - 1)th trial we must have observed (s - 1) successes
and x failures (since the very last trial is, by assumption, a success). The probability of (s - 1) successes
in (s x - 1) trials is given immediately by the binomial distribution as

+

+

+

The probability of this being followed by a success is the same equation multiplied by p, i.e.

which is the probability mass function of the negative binomial distribution NegBin(s, p). In other
words, the NegBin(s, p ) distribution returns the number of failures one will have before observing s
successes. The total number of trials n is thus given by

Figure 8.3 shows various negative binomial distributions. If s = 1, then the distribution (known as
the geometric distribution) is very right skewed and p(0) = p , i.e. the probability that there will be zero
failures equals p, the probability that the first trial is a success. We can also see that, as s gets larger,
the distribution looks more like a normal distribution. In fact, it is common to approximate the negative

~~~~
NegBin(l,0.5)

2n

NegBin(3.0.5)

0.3

E

4

g

g 0.2
a

0.1
0 0

2

4

6

8

10

0.1
0.08
0.06
0.04
0.02
0
0

Failures

5

10

15

20

Failures

~:lzJiz
NegBin(100.0.95)

*:z. 0.14
0.12
0.1
2 0.08
2 0.06
a 0.04

0.02
00

Failures

Figure 8.3

I

Examples of the negative binomial distribution.

5

Failures
10

15

20

Chapter 8 Some basic random processes

171

binomial distribution with a normal distribution under certain circumstances where s is large, in order to
avoid calculating the large factorials for p ( x ) above. A negative binomial distribution shifting k values
along the domain is sometimes called a binomial waiting time distribution, or a Pascal distribution.

8.2.3 Estimate of probability of success p
These results for the binomial and negative binomial distributions are both modelling variability: that is
to say, they are returning probability distributions of possible future outcomes. At times, however, we
are looking back at the results of a binomial process and wish to determine one of the parameters. For
example, we may have observed n trials of which s were successes, and from that information we would
like to estimate p . This binomial probability is a fundamental property of the stochastic system and can
never be observed, but we can become progressively more certain about its true value by collecting
data. As we shall see in Section 9.2.2, we can readily quantify our uncertainty about the true value of p
by using a beta distribution. In brief, if we have no prior information about p , or do not wish to assume
any prior information about p, then it is quite natural to use a uniform prior for p , and, through Bayes'
theorem, we have the equation

which is just the Beta(s

+ 1, n - s + 1) distribution, so

The beta distribution can also be used in the event that we have an informed opinion about the value
of p prior to collecting data. In such cases, providing we can reasonably model our prior opinion about
p with a beta distribution of the form Beta(a, b), the posterior turns out to be a Beta(a s , b n - s)
distribution because the beta distribution is conjugate to the binomial distribution (see Section 111.7.1).
Figure 8.4 illustrates a number of beta distributions.

+

+

8.2.4 Estimate of the number of trials n that were completed
Consider the situation where we have observed s successes and know the probability of success p, but
would like to know how many trials were actually done to have observed those successes. We wish to
estimate a value that is fixed, so we require a distribution that represents our uncertainty about what the
true value is. There are two possible situations: we either know that the trials stopped on the sth success
or we do not. If we know that the trials stopped on the sth success, we can model our uncertainty about
the true value of n as
n = NegBin(s, p ) s

+

If, on the other hand, we do not know that the last trial was a success (though it could have been),
then our uncertainty about n is modelled as

Both of these formulae result from a Bayesian analysis with uniform priors for n. We will now derive
these two results using standard Bayesian inference. The reader unfamiliar with this technique should
refer to Section 9.2. Let x be the number of failures that were carried out before the sth success. We

I72

Risk Analysis

Binomial probability

Binomial probability
Beta(2,20)

Beta(30,l)

25 T

0

Figure 8.4

0.2

1

0.4
0.6
0.8
Binomial probability

I

Binomial probability

Examples of the beta distribution.

will use a uniform prior for x, i.e. p(x) = c, and, from the binomial distribution, the likelihood function
is the probability that at the (s x - 1)th trial there had been (s - 1) successes and then the (s x)th
trial was a success, which is just the negative binomial probability mass function:

+

+

As we are using a uniform prior, and the equation for 1(Xlx) comes directly from a distribution, so
it must sum to unity, we can dispense with the formality of normalising the posterior distribution to 1
and observe
P(X) =

(

s+x-1
s-1

)

p S ( l - p)"

i.e. that x = NegBin(s, p).
In the second case, we do not know that the last trial was a success, only that, in however many
trials were completed, there were just s successes. We have the same uniform prior for the number of
failures, but our likelihood function is just the binomial probability mass function, i.e.

As this does not have the form of a probability mass function of a distribution, we need to complete
the Bayesian analysis, so

Chapter 8 Some basic random processes

1 73

The sum in the denominator equals llp. This can be easily seen by substituting s = a - 1, which gives

If the exponent for p were equal to a instead of (a - I), we would have the probability mass function
of the negative binomial distribution, which would then sum to unity, so our denominator must sum to
llp.
The posterior distribution from Equation (8.1) then reduces to

which is just a NegBin(s

+ 1, p ) distribution.

8.2.5 Summary of results for the binomial process
The results are shown in Table 8.1.
Table 8.1 Distributions of the binomial process.
Quantity

Formula

Notes

-

Number of successes
Probability of success
Number of trials

s = Binomial(n,p)
p = Beta(s + 1,n - s 1)
= Beta(a + s, b + n - s)
n = s + NegBin(s,p)
= s NegBin(s + 1,p)

+

+

Assuming a uniform prior
Assuming a Beta(a,b) prior
When the last trial is a success
When the last trial is not known to be a success

8.2.6 The beta-binomial process
An extension of the binomial process is to consider the probability p to be a random variable. A natural
candidate to model this variability is the Beta(a, B) distribution because it lies on [O, 11 and can take a
lot of shapes so it offers a great deal of flexibility.
The beta-binomial distribution models the number of successes:

The beta-negative binomial models the number of failures that will occur to achieve s successes:

t

Both distributions are included in ModelRisk.
It is important to remember that in the beta-binomial process the same value of p is applied to all
the binomial trials, meaning that, if p is randomly 0.4 (say) for one trial, it is 0.4 for all the others
too. If p were randomly varying between each trial, we would have each trial being an independent
Bernoulli(Beta(a, B)), but since a Bernoulli distribution can only be 0 or 1, this condenses to a set of

174

Risk Analysis

(&)

& is the mean of the beta distribution, and a collection of
n such trials would therefore be just a ~inomial(n,5 )
independent Bernoulli

trials, where

8.2.7 The multinomial process
Whereas in the binomial process there are only two possible outcomes of a trial (0 or 1, yes or no, male
or female, etc.), the multinomial process allows for multiple outcomes. The list of possible outcomes
must be exhaustive, meaning a trial cannot result in something that isn't listed as an outcome. For
example, if we throw a die there are six possible mutually exclusive (they can't happen at the same
time) and exhaustive (one must occur) outcomes.
There are three distributions associated with the multinomial process:
Multinomial(n, Ipl . . .pk}) which describes the number of successes in n trials that fall into each of
the k categories. It's joint probability mass function parallels the binomial equation

You can think of a multinomial distribution as a recursive sequence of nested binomial distributions
where the trials and probability of success are modified through the sequence:

For example, imagine that a person being treated in hospital has three possible outcomes: {cured, not
cured and deceased) with probabilities {0.6, 0.3, 0.1). Assuming their outcomes are independent, we
can model the outcome for 100 patients as follows:
Cured = Binomia1(100,0.6)
NotCured = Binomial(100 - Cured, 0.3/(0.3

+ 0.1))

Deceased = Binomial(100 - Cured - NotCured, 0.1/0.1)
= Binomial(100 - Cured - NotCured, 1)
= 100 - Cured - NotCured

The model in Figure 8.5 shows this calculation in a spreadsheet, together with the ModelRisk distribution VoseMultinomial which achieves the same result but in a single array function.
Negative Multinomial({s 1 . . .sk}, Ip . . .pk}) is the extension to the negative binomial distribution and
describes the number of extra trials (we can't really say "failures" any more because there are several
outcomes, not two in the binomial case where we could designate success or failure) there will be to
observe {sl . . . sk} successes. There are two versions of this question: "How many extra trials will there

Chapter 8 Some bas~crandom Drocesses

A

1

l

B

c

l

~

l

~

0.3
30
30

C
0.15
12
15

D
0.25
16
27

l

~

l

175

~I K

J

l

~

1
2
3

l ~ r i a l sn:

4
-

1001
A

6

E
0.06
7
5

F
0.04
7
5

Total check

5
6

Outcome
P(outcome)
Nested
Multinomial

JQ

Formulae table
=VoseBinomial(C2,C5)
C6 (output)
D6:H6 (output)
=V~~~B~~O~~~I($C$~-SUM($C$~:C~),D~/(~-SUM($C$~:C~)))
(C7:H7) (alt output) (=VoseMultinomial(C2,C5:H5))

7
8
9

11
2

0.2
28
18

13

*e8.5

Model for the multinomial process.
A

1
2
3
4
5
6

7
8

9
10
2
12

1

B

l
A

Outcome
P(outcome)
Requiredsuccesses
Negative Multinomial2

.

(C5:H5) (output)
C7
H7

l

~

l

~

0.15
17
0

D
0.25
5
39

E
0.06
11
2

F
0.04
2
8

I

87)

C

B

0.2
12
22

[~eaativeMultinomial 1 I

-

c

0.3
23
16

1261

l ~ e ~ a t i Multinornial
ve
1 sum

l

~

l

~

Formulae table
(=VoseNegMultinomial2(C4:H4,C3:H3))
=VoseNegMultinomial(C4:H4,C3:H3)
=SUM(CS:H5)

13

A

Figure 8.6 Model for the negative multinomial process.

be in total?", which has a univariate answer, and "How many extra trials will there be in each success
category beyond the number required?', which has a multivariate answer. The probability mass function
is quite complicated for both, but the modelling is pretty easy to see in the spreadsheet in Figure 8.6.
Note in this model that there will always be one zero (in row 5 in this random scenario) and that C7
and H7 will return the same distributions.

Dirichlet({al . . . a k } ) is the multivariate equivalent of the beta distribution which can be seen from its
joint density function:

k

where 0 5 xi 5 1 (a probability lies on [0, I]),

C xi
i-1

= 1 (the probabilities must sum to 1) and ai > 1.

I

176

Risk Analysis

We can use the Dirichlet distribution to model the uncertainty about the set of probabilities ( p l . . . pk}
of a multinornial process. There is a neat relationship with gamma distributions that we can use to
simulate a Dirichlet distribution which is shown in the above model (Figure 8.7), together with the
VoseDirichlet function. In this example, a clinical trial of some face cream has been performed with
300 randomly selected people to ascertain the level of allergic reactions, with the following outcomes:
227 - no effect; 41 - mild itching; 27 - significant discomfort; and 5 - lots of pain and regret. The
Dirichlet({sl
1 . . . sk 1)) will return the joint uncertain estimate of the probability that another
random person (a consumer) would experience each effect.

+

A
1
2
3
4
5
6

1

+

B
Outcome
Number observed
Estimated probability
Gamma distributions
Alternative method

l

c

l

D

l

E

F

None
Itching Discomfort Pain and
227
41
27
0.744
0.155
0.079
0.022
218.452
47.004
25.907
5.697
0.735
0.158
0.087
0.019

IGI

H

1

I

297.060

=VoseGamma(C3+1,1)
=SUM(C3:F3)
13

Figure 8.7

Model for the Dirichlet distribution.

8.3 The Poisson Process
In the binomial process there are n discrete opportunities for an event (a "success") to occur. In the
Poisson process there is a continuous and constant opportunity for an event to occur. For example,
lightning strikes might be considered to occur as a Poisson process during a storm. That would
mean that, in any small time interval during the storm, there is a certain probability that a lightning strike will occur. In the case of lightning strikes, the continuum of opportunity is time. However,
there are other types of exposure. The occurrence of discontinuities in the continuous manufacture of
wire could be considered to be a Poisson process where the measure of exposure is, for example,
kilometres or tonnes of wire produced. If Giardia cysts were randomly distributed in a lake, the consumption of cysts by campers drinking the water would be a Poisson process, where the measure of
exposure would be the amount of water consumed. Typographic errors in a book might be Poisson
distributed, in which case the measure of exposure could be inches of text, although one could just
as easily consider the errors to be binomially distributed with n = the number of characters in the
book.
In a Poisson process, unlike the binomial, as there is a continuum of opportunity for an event to occur
we can theoretically have anything between zero and an infinite number of events within a specific
amount of opportunity, and there is a probability of the event occurring no matter how small a unit of
exposure we might consider. In practice, few physical systems will exactly conform to such a set of
assumptions, but many systems nevertheless are very well approximated by a Poisson process. In the

Chapter 8 Some basic random processes

177

Binomial Process
Number of
trials n
(NegBin)

Poisson Process
obsenrationsa

Number of
successes s
(Binomial)

Probability of
success p

Hypergeometric Process
Number of
successes s
(HYP~~~w)

Number d
*
trials n
(lnvHvper~e0J

Z

I

of events per
unit time A
icamma,

I

/"
Population M,
Subpopulation D

Figure 8.8 Comparison of the distributions of the binomial, Poisson and hypergeometric processes.

Giardia cyst example above, assuming a Poisson process would theoretically mean that we could have
any number of cysts in a volume of water, no matter how small we made that volume. Obviously, this
assumption breaks down when we consider a volume of liquid around the size of a cyst, or smaller, but
this is almost never a restriction in practice.
Tlie distributions describing the Poisson and binomial processes are strongly related to each other,
as shown in Figure 8.8. In a binomial process, the key descriptive parameter is p, the probability of
occurrence of an event, which is the same for all trials, so the trials are independent of each other. The
key descriptive parameter for the Poisson process is h, the mean number of events that will occur per
unit of exposure, which is also considered to be constant over the total amount of exposure t . That
means that there is a constant probability per second, for example, of an event occurring, whether or
not an event has just occurred, has not occurred for an unexpectedly long time, etc. Such a process is
called "memoryless", and both the binomial and Poisson processes can be so described.
Like p for a binomial process, h is a property of the physical system. For static systems (stochastic
processes), p and h are not variables, but we still need distributions to express the state of our knowledge
(uncertainty) about their values.
In a Poisson process, we consider, with the number of events that may occur in a period t , the amount
of "time" one will have to wait to observe a events, and A, the average number of events that could
occur, known as the Poisson intensity. This section will now show how the Poisson distribution, which
describes the number of events a! that may occur in a period of exposure t , can be derived from the
binomial distribution as p tends to zero and n tends to infinity. We will then look at how to determine
the variability distribution of the time t one will need to wait before observing a events, which also
turns out to be the distribution of uncertainty of the time one must have waited before having observed
a events. Finally, we will discuss how to determine our state of knowledge (uncertainty) about h given
a set of observed events a in a period t .

178

Risk Analysis

8.3.1 Deriving the Poisson distribution from the binomial
Consider a binomial process where the number of trials tends to infinity, and the probability of success
at the same time tends to zero, with the constraint that the mean of the binomial distribution = np
remains finitely large. The probability mass function of the binomial distribution can be altered closely
to model the number of successes that will occur under such conditions, as follows:

Using ht = np,
p(X = x) =

x!(n - x)!

For n large and p small,

which simplifies the equation to

This is the probability mass function for the Poisson(ht) distribution, i.e.
Number of events a in time t = Poisson(ht)
when the average number of events that will occur in a unit interval of exposure is known to be A. We
can see how this interpretation fits in with the derivation from the binomial distribution. Imagine that a
young lady decides to buy a pair of very high platform shoes that are in fashion. After some practice she
gets used to the shoes, but there remains a smallish probability (say 1 in 50) that she will fall over with
each step she takes. She decides to go for a short walk, say 100metres. If we say that each step measures
1 metre, then we can model the number of falls she will have on her walk as either Binomial(100, 2 %)
or Poisson(100 * 0.02) = Poisson(2). Figure 8.9 plots these two distributions together and shows how
closely the binomial distribution is approximated by the Poisson distribution in such limiting cases.
The Poisson distribution is often mistakenly considered to be only a distribution of rare events. It is
certainly used in this sense to approximate a binomial distribution, but has far more importance than
that. Where there is a continuum of exposure to an event, the measure of exposure can be split up into
smaller and smaller divisions, until the probability of the event occurring in each division has become
extremely small, while there are also an enormous number of divisions. For example, I could stand on
a street corner during rush hour, looking for red cars to pass by. For the duration of the rush hour,
one could consider that the frequency of cars going by is quite constant, and that the red cars in the
traffic are randomly distributed among the city's traffic. Then the number of red cars passing by will

Cha~ter8 Some basic random Drocesses

I

1 79

successes

Figure 8.9 Comparison of the Binomial(100, 0.02) and Poisson(2)distributions.

be Poisson distributed. If, on average, 0.6 red cars passed by per minute, I could model the number of
cars passing by: in the next 10 seconds as Poisson(O.1); in the next hour as Poisson(36), etc. I could
divide up the time I stand on the street comer into such tiny elements (for example 11100th of a second)
that the probability of a red car passing by within a particular 11100th of a second would be extremely
small. The probability would be so small that the chance of two cars going by within that period would
be absolutely negligible. In such circumstances, we can consider each of these small elements of time
to be independent Bernoulli trials. Similarly, the number of raindrops falling on my head each second
during a shower would also be Poisson distributed.

8.3.2 "Time" to wait to obsewe a events
The Poisson process assumes that there is a constant probability that an event will occur per increment
of time. If we consider a small element of time At, then the probability an event will occur in that
element of time is kAt, where k is some constant. Now let P(t) be the probability that the event will
not have occurred by time t. The probability that an event occurs the first time during the small interval
At after time t is then kAt P(t). This is also equal to P(t) - P(t At) and we have

+

Making At infinitesimally small, this becomes the differential equation

Integration gives

180

Risk Analysis

If we define F(t) as the probability that the event will occur before time t (i.e. the cumulative distribution
function for t), we then have

which is the cumulative distribution function for an exponential distribution Expon(1lk) with mean llk.
Thus, llk is the mean time between occurrences of events or, equivalently, k is the mean number of
events per unit time, which is the Poisson parameter h. The parameter llh, the mean time between
occurrences of events, is given the notation j3.
We have thus shown that the time until occurrence of the first event for a Poisson distribution is
given by

where j3 = llh. It can also be shown (although the maths is too laborious to repeat here) that the time
until ol events have occurred is given by a gamma distribution:

The Expon(p) distribution is therefore simply a special case of the gamma distribution, namely

It is interesting to check the idea that a Poisson process is "memoryless". The probability that the
first event will occur at time x, given it has not yet occurred by time t (x > t), is given by

which is another exponential distribution. Thus, although the event may not have occurred after time t ,
the remaining time until it will occur has the same probability distribution as it had at any prior point
in time.

8.3.3 Estimate of the mean number of events per period (Poisson intensity) 1
Like the binomial probability p, the mean events per period h is a fundamental property of the stochastic
system in question. It can never be observed and it can never be exactly known. However, we can
become progressively more certain about its value as more data are collected. Bayesian inference, see
Section 9.2, provides us with a means of quantifying the state of our knowledge as we accumulate data.
Assuming an uninformed prior n(h) = l l h (see Section 9.2.2) and the Poisson likelihood function
for observing a! events in period t:

since we can ignore terms that don't involve h, and we then get the posterior distribution

I

Chapter 8 Some basic random processes

18 1

which is a gamma(a, llt) distribution. The gamma distribution can also be used to describe our uncertainty about h if we start off with an informed opinion and then observe a events in time t. From
Table 9.1, if we can reasonably describe our prior belief with a Gamma(a, b) distribution, the posterior
is given by a Gamma(a + a , b/(l bt)) distribution.
The choice of n(h) = l/h (which is equivalent to a Gamma(llz, z) distribution, where z is extremely
large) as an uninformed prior is an uncomfortable one for many. This prior makes mathematical sense
in that it is transformation invariant and therefore would give the same answer whether one performed
an analysis from the point of view of A or B = l / h or even changed the unit of exposure relating to
A. On the other hand, a plot of this prior doesn't really seem "uninformed" since it is so peaked at
zero. However, the shape of the posterior gamma distribution becomes progressively less sensitive to
the prior distribution as data are collected. We can get a feel for the importance of the prior with the
following train of thought:

+

(i) A n(A) = l / h prior is equivalent to Garnma(llz, z), where z approaches infinity. You can prove
this by looking at the gamma probability distribution function and setting a to zero and ,f3 to
infinity.
(ii) A flat prior (the opposite extreme to the n(h) = I/h prior) would be equivalent to a Gamma(1,
z), where z approaches infinity, i.e. an infinitely drawn out exponential distribution.

+

+

(iii) We have seen that, for a Gamma(a, b) prior, the resultant posterior is Gamma(a a , b/(l bt)),
which means that the posterior for (i) would be Gamma(a,llt), and the posterior for (ii) would
be Garnma(a 1, 1It).

+

+

(iv) Thus, the sensitivity of the gamma distribution to the prior amounts to whether ( a 1) is approximately the same as a . Moreover, Gamma(a, j3) is the sum of a independent Exponential(j3)
distributions, so one can think of the choice of priors as being whether we add one extra exponential distribution or not to the a exponential distributions from the data. Thus, if a were 100,
for example, the distribution would be roughly 1 % influenced by the prior and 99 % influenced
by the data.

8.3.4 Estimate of the elapsed period t
We can estimate the period t that has elapsed if we know h and the number of events a that have occurred
in time t. The maths turns out to be exactly the same as the estimate for h in the previous section.
The reader may like to verify that, by using a prior of n(t) = lit, we obtain a posterior distribution
t = Gamma(a, llh), which is the same result we would obtain if we were trying to predict forward (i.e.
determine a distribution of variability of) the time required to observe a events given h = 1/B. Also,
if we can reasonably describe our prior belief with a Gamma(a, b) distribution, the posterior is given
by a Gamma(a a, b/(l bh)) distribution.

+

+

8.3.5 Summary of results for the Poisson process
The results are shown in Table 8.2.

8.3.6 The multivariate Poisson process
The properties of the Poisson process make extending to a multivariate situation very easy. Imagine that
we have three categories of car accident: (a) no injury; (b) one or more persons injured but no fatalities;

182

Risk Analysis

Table 8.2

Distributions of the Poisson process.

Quantity

Formula

Number of events

a = Poisson(ht)

Mean number of events
per unit exposure

h = Gamma(a, 1It)
= Gamma(a a, b/(l

Time until observation of first event

tl

Time until observation of first a events

ta = Gamma(a, 1/A)

Time that has elapsed
for a events

ta = Gamma(a, 1/A)
= Gamma(a a, b/(l

+

Notes

Assuming uninformed prior
Assuming Gamma(a,b) prior

+ bt))

= Expon(1/A) = Gamma(1,l /h)

+

+ bh))

Assuming uninformed prior
Assuming Gamma(a,b) prior

(c) one or more persons killed. We'll assume that the accidents occur independently and follow a Poisson
process with expected occurrences ha, hb and A, per year. The number that will occur in the next T
years (assuming that the rates won't change over time) is Poisson(T * (ha hb h,)). The probability
that the next accident is of type (a) is

+ +

The time until the next a accident is
Gamma
and the uncertainty about the true values of each h can be estimated separately as described in Sections
8.3.3 and 9.1.5.

8.3.7 Modifying I in a Poisson process
The Poisson model assumes that h will be constant over the time in which we are counting. That can be a
tenuous assumption. Hurricanes, disease outbreaks, suicides, etc., occur more frequently at certain times
of the year; car accidents, robberies and high-street brawls occur more frequently at certain times of the
day (and sometimes year too). In fact it turns out that, if h has a consistent (even if unknown) seasonal
variation, we can often get round the problem. Imagine that boat accidents occur in each month i at a
rate hi, i = 1, . . . , 12. The number occurring in each future month i will be ai = Poisson(hi), and the
12

total over the year will be

i=l

Poisson(hi). From the identity Poisson(a)

this equation can be rewritten Poisson

C hi
(il:

1

+ Poisson(b) = Poisson(a + b),

, i.e. the boat accidents occurring in a year also follow a

Poisson process. Thus, as long as we ensure that we analyse data over a complete number of seasonal
periods (a whole number of years in this case) and predict for a whole number of seasonal periods,
we can ignore the fact that h changes seasonally. That is immensely useful. If I've observed that
historically there have been an average of 23 outbreaks per year of campylobacteriosis in a city (an
outbreak is defined in epidemiology as an event unconnected to others, so we can think of them as
occurring randomly in time and independently), then I can model the number of outbreaks next year as

Chapter 8 Some basic random processes

183

Poisson(23) without worrying that most of those will occur over the summer months. I can also compare
year-on-year data on outbreaks using Poisson mathematics. What I cannot do, of course, is say that July
will have Poisson(23/12) outbreaks.
I used to live in a rural area of the South of France. As winter approached, the first time there was
black ice on the roads in the morning you would see cars buried in hedges, woods and fields along
the roadside. The more intense the sudden cold snap, the more cars you would see. Some years there
weren't so many, others it was mayhem. Clearly in situations like this the expected rate of accidents
is a random variable. The most common way to model that random variation is to multiply h by a
~ a m m a ( ih, ) distribution. This gamma has a mean of 1 and a standard deviation of h , giving a Poisson
rate of ~ a m m a ( ihh).
,
The idea therefore is that the gamma distribution is just adding a coefficient of
variation of h to h. It turns out that the combination of these two distributions is a p61ya(;, h h ) or, if
is an integer, simplifies to a ~ e ~ ~ i n ( ; , The result is convenient because it means we can use
the P6lya or NegBin distributions to model this Poisson(h) fi Gamma(a, B ) mixture. Along the way,
we can also see that the P6lya and NegBin distributions have a greater coefficient of variation than the
Poisson. Often you will see in statistics that researchers call the data "overdispersed when they want
to fit a Poisson distribution because the data have a variance greater than their mean (they would be
equal for a Poisson distribution), and the statisticians then turn to a NegBin (although they would be
better off with a P6lya but it is less well known).
The Gamma distribution is useful because we have an extra parameter h to play with and can therefore
match, for example, the mean and variance (or any two other statistics) to data. However, at times that
is not enough, and we might need more control to match, for example, the skewness too. Instead of
modelling h in the form Gamma(a, b), we can add a positive shift so we get Poisson(Gamma(a, b) c),
which turns out to be a Delaporte(a, b, c) distribution.

k

A).

+

8.4 The Hypergeometric Process
The hypergeometric process occurs when one is sampling randomly without replacement from some
population, and where one is counting the number in that sample that have some particular characteristic.
This is a very common type of scenario. For example, population surveys, herd testing and lotto are
all hypergeometric processes. In many situations the population is very large in comparison with the
sample, and we can assume that, if a sample were put back into the population, the probability is very
small that it would be picked again. In that case, each sample would have the same probability of picking
an individual with a particular characteristic: in other words, this becomes a binomial process. When
the population is not very large compared with the sample (a good rule is that the population is less
than 10 times the size of the sample), we cannot make a binomial approximation to the hypergeometric.
This section discusses the distributions associated with the hypergeometric process.

8.4.1 Number in a sample with a particular characteristic
Consider a group of M individual items, D of which have a certain characteristic. Randomly picking
n items from this group without replacement, where each of the M items has the same probability of
being selected, is a hypergeometric process. For example, imagine I have a bag of seven balls, three of
which are red, the other four of which are blue. What is the probability that I will select two red balls
from the bag if I randomly pick three balls out without replacement?

184

Risk Analysis

First of all, we note that the probability of the second ball picked being red depends on the colour
of the first picked ball. If the first ball was red (with probability +), there are only two red balls left
of the six balls remaining. The probability of the second ball being red, given the first ball was red,
is therefore = However, each ball remaining in the bag has the same probability of being picked,
which means that each event resulting in x red balls being selected in total has the same probability.
We thus need only consider the different combinations of events that are possible. There are, from the
discussion in Section 6.3.4,
= 35 different possible ways that one can select three items from seven.
= 3 ways to select two red balls from the three in the bag, and there are (;) = 4 ways to
There are
select one blue ball from the four in the bag. Thus, out of the 35 ways we could have picked three balls
from the group of seven, only
= 3 * 4 = 12 of those ways would give us two red balls. Thus,
the probability of selecting two red balls is 12/35 = 34.29 %.
In general, for a population size M of which D have the characteristic of interest, in selecting a
sample of size n from that population at random without replacement, the probability of observing x
with the characteristic of interest is given by

5.

(i)

(32)

(i)(i)

which is the probability mass function of the hypergeometric distribution Hypergeo(n, D, M). Just in
case you are curious, the hypergeometric distribution gets its name because its probabilities are successive
terms in a gaussian hypergeometric series.
Binomial approximation t o the hypergeometric

If we replaced each item one at a time back into the population when taking our sample of size n,
the probability of each individual item having the characteristic of interest is D I M and the number of
times we sampled from D is then given by a Binomial(n, DIM). More usefully, if M is very large
compared with n, the chance of picking the same item more than once if one were to replace the item
after each selection would be very small. Thus, for large M (usually n .= 0.1M is quoted as being a
satisfactory condition) there will be little difference in our sampling result whether we sample with or
without replacement, and we can approximate a Hypergeo(n, D , M) with a Binomial(n, DIM), which
is much easier to calculate.
Multivariate hypergeometric distribution

The hypergeometric distribution can be extended to situations where there are more than two types of
item in the population (i.e. more than D of one type and (M - D) of another). The probability of getting
sl from D l , s2 from D2, etc., all in the sample n is given by

k

where

k

C si = n, C Di = M,
i=l

i=l

Di 2 si 2 0, M > Di > 0.

Chapter 8 Some basic random processes

185

8.4.2 Number of samples to get a specific s
Consider the situation where we are sampling without replacement from a population M with D items
with the characteristic of interest until we have s items with the required characteristic. The distribution
of the number of failures we will have before the sth success can be easily calculated in the same
manner as we developed for the negative binomial distribution in Section 8.2.2. The probability of
observing (s - 1) successes in (x s - 1) trials (i.e. x failures) is given by direct application of the
hypergeometric distribution:

+

+

The probability p of then observing a success in the next trial (the (s x)th trial) is simply the
number of D items remaining (= D - (s - I)) divided by the size of the population remaining (=
M - ( s + x - 1)):

and the probability of having exactly x failures up to the sth success, where trials are stopped at the sth
success, is then the product of these two probabilities:

This is the probability mass function for the inverse hypergeometric distribution InvHypergeo(s, D, M)
and is analogous to the negative binomial distribution for the binomial process and the gamma distribution for the Poisson process. So

For a population M that is large compared with s, the inverse hypergeometric distribution approximates
the negative binomial
InvHypergeo(s, D , M) x NegBin(s, DIM)
and if the probability D I M is very small
InvHypergeo(s, D, M)

%

Garnma(s, MID)

Figure 8.10 shows some examples of the inverse hypergeometric distribution. An inverse hypergeometric distribution shifted k units along the domain is sometimes called a negative hypergeometric
distribution. ModelRisk offers the InvHypergeo(s, D, M) distribution, and the negative hypergeometric
can be achieved by writing VoseInvHypergeo(s, D, M, VoseShift(k)).

186

Risk Analysis

lnvHypergeo(2,2,50)

lnvHypergeo(4,5,50)

g 0.035

i.

p%

g

0.03
m 0.025
2 0.02
a 0.015
0.01
0.005

,?

0

10

20

30

40

50

0.035
0.03
0.025
0.02
0.015
0.01
0.005
0

10

Failures

20

30

40

50

Failures

Figure 8.10 Examples of the inverse hypergeometric distribution.

8.4.3 Number of samples to have observed a specific s
The inverse hypergeometric distribution was derived above as a distribution of variability in predicting
the number of failures one will have before the sth success. However, it can equally be derived as a
distribution of uncertainty about the number of failures x = n - s one must have had if one knows
s , M, D using Bayes7 theorem and a uniform (i.e. uninformed) prior on x. So

In the case where you do not know that the trials had stopped with the sth success, we can still
apply Bayes' theorem with a uniform prior for x and a likelihood function given by a hypergeometric
probability:

which, with a uniform prior, is also the posterior distribution. Substituting n - s for x yields
n ! ( M - n)!

f (n) a (n - s)!(M - D - n + s)!

Chapter 8 Some basic random processes

3
4
5
6
7

8
9
10
11
12
13
total

f(n)
Normalised f(n)
6.5E-04
1.4E-03
1.7E-03
1.6E-03
1.2E-03
7.9E-04
4.2E-04
1.9E-04
6.4E-05
1.5E-05
2.OE-06
8.OE-03

[

Distribution

1

6

187

]

Figure 8.11 A Bayesian inference model with hypergeometric uncertainty. Note that the discrete distribution
could have been used with columns B and C, removing the necessity to normalise the distribution.

Equation (8.6) has dropped out all the terms that are not a function of n and so can be normalised out
of the equation. The uncertainty distribution for n doesn't equate to a standard distribution, so it needs
to be manually normalised. However, it is easier just to work with Equation (8.6) and normalise in the
spreadsheet. Figure 8.11 shows an example of such a calculation where the final distribution is in cell
G18. Note that, if one uses a discrete distribution as shown in this spreadsheet, it is actually unnecessary
to normalise the probabilities, since software like @RISK, Crystal Ball and ModelRisk automatically
normalises them to sum to unity.

8.4.4 Estimate of population and subpopulation sizes
The size of D and M are fundamental properties of the stochastic system, like p for a binomial process
and h for a Poisson process. Distributions of our uncertainty about the value of these parameters can
be determined from Bayesian inference, given a certain sample size taken from the population M, of
which s belonged to the subpopulation D. The hypergeometric probability of s successes in n samples
from a population M of which D have the characteristic of interest is given by Equation (8.5) as

188

Risk Analys~s

So, with a uniform prior, we get the following posterior equations for D and M:

P(D) a

(y)(t1P)

P(M)

(:)(;I:)

rx

D!(M - D)!
(D - s)!(M - D - n

a

+ s)!

(M-D)!(M-n)!
(M- D-n+s)!M!

These formulae do not equate to standard distributions and need to be normalised in the same way as
discussed for Equation (8.6).

8.4.5 Summary of results for the hypergeometric process
The results are shown in Table 8.3.

Table 8.3

Distributions of the hypergeometric process.

Quantity
Number of subpopulation in the
sample
Number of samples to observe s
from the subpopulation
Number of samples there were to
have observed s from the
subpopulation
Number of samples n there were
before having observed s from
the subpopulation

Formula

Notes

n = s+ InvHyp(s, D, M)

Where the last sample is known to
have been from the subpopulation

n! M-n !
s D +

f n cc

n

Size of subpopulation D

f(D)

D! M-D)!
(o-s)!(h-D-n+~)!

Size of population M

f(M) cc

(M-D)I(M-n !
M!(M-D-n+l)!

s

Where the last sample is not known to
have been from the subpopulation.
This uncertainty distribution needs
to be normalised
This uncertainty distribution needs to
be normalised
This uncertainty distribution needs to
be normalised

8.5 Central Limit Theorem
The central limit theorem (CLT) is an asymptotic result of summing probability distributions. It turns
out to be very useful for obtaining sums of individuals (e.g. sums of animal weights, yields, scraps).
It also explains why so many distributions sometimes look like normal distributions. We won't look at
the derivation, just see some examples and its use.

Chapter 8 Some basic random processes

189

The sum C of n independent random variables X i (where n is large), all of which have the same distribution, will asymptotically approach a normal distribution with known mean and standard deviation:

where p and a are the mean and standard deviation of the distribution from which the n samples are
drawn.

8.5.1 Examples
Imagine that the distribution of the weight (read "mass" if you want to be technical) of random nails
produced by some company has a mean of 27.4 g and a standard deviation 1.3 g. What will be the
weight of a box of 100 nails? The answer, assuming that the nail weight distribution isn't really skewed,
is the following normal distribution:

I

I

This CLT result turns out to be very important in risk analysis. Many distributions are the sum of a
number of identical random variables, and so, as that sum gets larger, the distribution tends to look like
a normal distribution. For example: Gamma(a, B) is the sum of a independent Expon(B) distributions,
so, as a gets larger, the gamma distribution looks progressively more like a normal distribution. An
exponential distribution has mean and variance of B, so we have

Other examples are discussed in the section on approximating one distribution with another.

I

How large does n have to be for the sum to be distributed nomallv?

1 Uniform
Symmetric triangular
Normal
Fairlv skewed
1 Exponential

(12(try it: an old way of generating normal distributions)
' 6 (because U(a, b) ~ ( ab), = ~ ( s aa,+ b, 2b))

;

I

1

I!
30+ (e.a. 30 lots of Poisson(2) = Poisson(6O\\
150+ (check with Gamma(a, b) = sum of a Exponential(b)s)I

1

8.5.2 Other related results
The average of a large number of independent, identical distributions

Dividing both sides of Equation (8.7) by n, the average x of n variables drawn independently from the
same distribution is given by

Cxi
-

x=-

i=l

x

Normal(np, f i n )

( .

= Normal u --=
o\

(8.8)

190

Risk Analys~s

Note that the result of Equation (8.8) is correct because both the mean and standard deviation of the
normal distribution are in the same units as the variable itself. However, be warned that for most
distributions one cannot simply divide by n the distribution parameters of a variable X to get the
distribution of X l n .
The product of a large number of independent, identical distributions

CLT can also be applied where a large number of identical random variables are being multiplied
together. Let P be the product of a large number of random variables Xi, i = 1, . . . , n, i.e.

Taking logs of both sides, we get

The right-hand side is the sum of a large number of random variables and will therefore tend to a normal
distribution. Thus, from the definition of a lognormal distribution, P will be asymptotically lognormally
distributed.
A neat result from this is that, if all Xi are lognormally distributed, their product will also be
lognormally distributed.
Is CLT the reason the normal dlstr~but~on
is so popular?

Many stochastic variables are neatly described as the sum or product, or a mixture, of a number of
random variables. A very loose form of CLT says that, if you add up a large number n of different
random variables, and if none of those variables dominates the resultant distribution spread, the sum will
eventually look normal as n gets bigger. The same applies to multiplying (positive) different random
variables and the lognormal distribution. In fact, a lognormal distribution will also look very similar
to a normal distribution if its mean is much larger than its standard deviation (see Figure 8.12), so
perhaps it should not be too surprising that so many variables in nature seem to be somewhere between
lognormally and normally distributed.

8.6 Renewal Processes
In a Poisson process, the times between successive events are described by independent identical exponential distributions. In a renewal process, like a Poisson process, the times between successive events
are independent and identical, but they can take any distribution. The Poisson process is thus a particular
case of a renewal process. The mathematics of the distributions of the number of events in a period
(equivalent to the Poisson distribution for the Poisson process) and the time to wait to observe x events

Cha~ter8 Some basic random Drocesses

19 1

Figure 8.12 Graphs of the normal and lognormal distribution.

(equivalent to the gamma distribution in the Poisson process) can be quite complicated, depending on the
distribution of time between events. However, Monte Carlo simulation lets us bypass the mathematics
to arrive at both of these distributions, as we will see in the following examples.
Example 8.1 Number of events in a specific period

It is known that a certain type of light bulb has a lifetime that is Weibull(l.3, 4020) hours distributed.
(a) If I have one light bulb working at all times, replacing each failed light bulb immediately with
another, how many light bulbs will have failed in 10 OOOhours? (b) If I have 10 light bulbs going at
all times, how many will fail in 1000 hours? (c) If I had one light bulb going constantly, and I had 10
light bulbs to use, how long would it take before the last light bulb failed?
(a) Figure 8.13 shows a model to provide the solution to this question. Note that it takes account of
the possibility of 0 failures.
(b) Figure 8.14 shows a model to provide the solution to this question. Figure 8.15 compares the results
for this question and part (a). Note that they are significantly different. Had the time between events
been exponentially distributed, the results would have been exactly the same.
(c) The answer is simply the sum of 10 independent Weibull(l.3, 4020) distributions.

+

Period of interest
Number of failures

10000

=IF(B3>$D$21,0,1)

Figure 8.13 Model solution to Example 8.1 (a).

Period of interest
Number of failures

I

Formulae table
=VoseWeibull(l.3,4020)
B4:B19, E4:E19, etc. =B3+VoseWeibull(l.3,4020)

Figure 8.14 Model solution to Example 8.1(b).

Chapter 8 Some basic random processes

193

Figure 8.15 Comparison of results from the models of Figures 8.1 3 and 8.1 4.
1

1

8.7 Mixture Distributions
Sometimes a stochastic process can be a combination of two or more separate processes. For example,
car accidents at some particular place and time could be considered to be a Poisson variable, but the
mean number of accidents per unit time h may be a variable too, as we have seen in Section 8.3.7.
A mixture distribution can be written symbolically as follows:

where FA represents the base distribution and FB represents the mixing distribution, i.e. the distribution
of O. So, for example, we might have

which reads as "a gamma mixture of Poisson distributions".
There are a number of commonly used mixture distributions. For example

which is the Beta-Binomial(n, a ,

B ) distribution, and

where the Poisson distribution has parameter h = 4 . p, and p = Beta(a, B). [Though also used in
biology, this should not be confused with the beta-Poisson dose-response model.]

194

Risk Analysis

The cumulative distribution function for a mixture distribution with parameters

Qi

is given by

where the expectation is with respect to the parameters that are random variables. Thus, the functional
form of mixture distributions can quickly become extremely complicated or even intractable. However,
Monte Carlo simulation allows us very simply to include mixture distributions in our model, providing
the Monte Carlo software being used (for example, @RISK, Crystal Ball, ModelRisk) generates samples
for each iteration in the correct logical sequence. So, for example, a Beta-Binomial(n, a, B) distribution
is easily generated by writing =Binomial(n, Beta(czl, B)). In each iteration, the software generates a value
first from the beta distribution, then creates the appropriate binomial distribution using this value of p
and finally samples from that binomial distribution.

8.8 Martingales
A martingale is a stochastic process with sequential variables Xi(i = 1, 2, . . .), where the expected value
of each variable is the same and independent of previous observations. Written more formally

Thus, a martingale is any stochastic process with a constant mean. The theory was originally developed
to demonstrate the fairness of gambling games, i.e. to show that the expected winnings of each turn of
a game were constant; for example, to show that remembering the cards that had already been played
in previous hands of a card game wouldn't impact upon your expected winnings. [Next time a friend
says to you "21 hasn't come up in the lottery numbers for ages, so it must show soon", you can tell
him or her "Not true, I'm afraid, it's a martingale" - they'll be sure finally to understand]. However,
the theory has proven to be of considerable value in many real-world problems.
A martingale gets its name from the gambling "system" of doubling your bet on each loss of an even
odds bet (e.g. betting Red or Impaire at the roulette wheel) until you have a win. It works too - well,
in theory anyway. You must have a huge bankroll, and the casino must have no bet limit. It gives low
returns for high risk, so as risk analysis consultants we would advise you to invest in (gamble on) the
stock market instead.

8.9 Miscellaneous Examples
II

I have given below a few example problems for different random processes discussed in this chapter to
give you some practice.

8.9.1 Binomial process problems
In addition to the problems below, the reader will find the binomial process appearing in the following
examples distributed through this book: examples in Sections 4.3.1, 4.3.2 and 5.4.6 and Examples 22.2
to 22.6, 22.8 and 22.10, as well as many places in Chapter 9.

I

Chapter 8 Some basic random processes

I95

Example 8.2 Wine sampling

Two wine experts are each asked to guess the year of 20 different wines. Expert A guesses 11 correctly,
while expert B guesses 14 correctly. How confident can we be that expert B is really better at this
exercise than expert A?
If we allow that the guess of the year for each wine tasted is independent of every other guess, we
can assume this to be a binomial process. We are thus interested in whether the probability of one expert
guessing correctly is greater than the other's. We can model our uncertainty about the true probability
of success for expert A as Beta(l2, 10) and for expert B as Beta(l5, 7). The model in Figure 8.16 then
randomly samples from the two distributions and cell C5 returns a 1 if the distribution for expert B has
a greater value than the distribution for expert A. We run a simulation on this cell, and the mean result
equals the percentage of time that the distribution for expert B generated a higher value than for expert
A, and thus represents our confidence that expert B is indeed better at this exercise. In this case, we are
83 % confident. +
Example 8.3 Run of luck

If I toss a coin 10 times, what is the distribution of the maximum number of heads I will get in a row?
The solution is provided in the spreadsheet model of Figure 8.17.

+

Example 8.4 Multiple-choice exam

A multiple-choice exam gives three options for each of 50 questions. One student scores 21 out of 50.
(a) What is the probability that the student would have achieved this score or higher without knowing
anything about the subject? (b) Estimate how many questions to which the student actually knew the
answer.
(a) The student has a 113 probability of getting any answer right without knowing anything, and his
or her possible score would then follow a Binomial(50, 113) distribution. The probability that the
student would have achieved 21150 or higher is then = 1 - BINOMDIST (20, 50, 113, I), i.e.
(1 - the probability of achieving 20 or lower).

Formulae table

C5 (output)

Figure 8.16

Model for Example 8.2.

=IF(C4>C3,1,0)

Figure 8.17 Model for Example 8.3.

A!

-1 .
-2 .
3
4
5

-

22
23

-24 .
25
-

B

Known
answers
0
1
2
19
20
21
sum

I

c
Likelihood
5.OE-02
7.OE-02
9.1 E-02
4.8E-07
3.9E-08
1.6E-09
9.2E-01

I

D

Normalised
posterior
5.45%
7.63%
9.84%
0.00%
0.00%
0.00%

26

I

E

I

F

o

3

I

G

I

H

I

I

I

J

0.14
0.12
0.1
0.08
0.06
0.04

Formulae table

29
30
31
32

C3:C24

=BINOMDIST(21-B3,50,1/3,0)

D3:D24

=C3/$C$25

6

9

12

15

18

21

Figure 8.18 Model for Example 8.4(b).

(b) This is a Bayesian problem. Figure 8.18 illustrates a spreadsheet model of the Bayesian inference
with a flat prior and a binomial likelihood function. The imbedded graph is the posterior distribution
of our belief about how many questions the student actually knew. +

7

Chapter 8 Some baslc random processes

197

8.9.2 Poisson process problems
In addition to the problems below, the reader will find the Poisson process appearing in the following
examples distributed through this book: examples in Sections 9.2.2 and 9.3.2 and Examples 9.6, 9.1 1,
22.12, 22.14 and 22.16.
Example 8.5 Insurance problem

My company insures aeroplanes. They crash at a rate of 0.23 crashes per month. Each crash costs
$Lognormal(l20, 52) million. (a) What is the distribution of cost to the company for the next 5 years?
(b) What is the distribution of the value of the liability if I discount it at the risk-free rate of 5 %?
The solution to part (a) is provided in the spreadsheet model of Figure 8.19, which uses the VLOOKUP
Excel function. Part (b) requires that one know the time at which each accident occurred, using exponential distributions. The solution is shown in Figure 8.20. +
Example 8.6 Ra~nwaterbarrel problem

It is a monsoon and rain is falling at a rate of 270 drops per second per square metre. The rain drops
each contain 1 millilitre of water. If I have a drum standing in the rain, measuring 1 metre high and
0.3 metres radius, how long will it be before the drum is full?
The solution is provided in the spreadsheet model of Figure 8.21.

+

Figure 8.19 Model for Example 8.5(a).

Mean crashes per month (A)
Number of months ( t )
Risk free interest rate
Total cost ($M)

Time of accident
(months)
5.105
7.270
13.338
102.497
105.567
113.070

Cost of accidents
($MI
158
115
63
0
0
0

1244.9

Discounted
cost ($M)
154.69
111.85
59.93
0.00
0.00
0.00

Figure 8.20 Model for Example 8.5(b).

Drum radius (m)
Drum volume (mA3)
Drops falling per second into barrel

0.283
76.341

Time to wait to fill barrel (seconds)

3714.0

Formulae table

=ROUNDUP(D4/0.000001,0)
=Gamma(D6,1ID5)

Figure 8.21

Model for Example 8.6.

Example 8.7 Equipment reliability

A piece of electronic equipment is composed of six components A to F. They have the mean time
between failures shown in Table 8.4. The components are in serial and parallel configuration as shown
in Figure 8.22. What is the probability that the machine will fail within 250 hours?

Chapter 8 Some basic random processes

199

Table 8.4 Mean time
between failures of electronic
equipment components.
Component

AI

B

I

C

I

D

I

MTBF (hours)

E

I

F

I

G

I

H

I

~

I

J

1
4

5
7

a

27.8
299 1742.1
1234 1417.9

9
10
11
12

2
2

Time to failure of system

+q

*a

210.1

Formulae table
D3:D8
=Expon(C3)
D l 0 (output) =MIN(D3,MAX{D4:D6),MAX(D7:08))

15

Figure 8.22 Model for Example 8.7

We first assume that the components will fail with a constant probability per unit time, i.e. that their
times to failure will be exponentially distributed, which is a reasonable assumption implied by the
MTBF figure. The problem belongs to reliability engineering. Components in series make the machine
fail if any of the components in series fail. For parallel components, all components in parallel must fail
before the machine fails. Thus, from Figure 8.22 the machine will fail if A fails, or B, C and D all fail,
or E and F both fail. Figure 8.22 also shows the spreadsheet modelling the time to failure. Running a
simulation with 10 000 iterations on cell D l 0 gives an output distribution of which 63.5 % of the trials
were less than 250 hours. +

8.9.3 Hypergeometric process problems
In addition to the problems below, the reader will find the hypergeometric process appearing in the
following examples distributed through this book: examples in Sections 22.4.2 and 22.4.4, as well as
Examples 9.2, 9.3, 22.4, 22.6 and 22.8.
Example 8.8 Equal selection

I am to pick out at random 10 names from each of two bags. The first bag contains the names of 15
men and 22 women. The second bag contains the names of 12 men and 15 women. (a) What is the

200

Risk Analysis

probability that I will have the same proportion of men in the two selections? (b) How many times
would I have to sample from these bags before I did have the same proportion?
(a) The solution can be worked out mathematically or by simulation. Figure 8.23 provides the mathematical calculation and Figure 8.24 provides a simulation model, where the required probability
is the mean of the output result.
A1

B

C

D

1

I

E

27

6
7
8

9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

25

Probability
Bag 1
Bag 2
Men in sample
0
0.19%
0.04%
1
2.14%
0.71%
2
9.64%
5.03%
22.28%
16.78%
3
4
29.24%
29.37%
5
22.70%
28.19%
10.51%
14.95%
6
7
2.84%
4.27%
8
0.43%
0.62%
9
0.03%
0.04%
10
0.00%
0.00%
Total probability of same in each bag

C9:D19
E9:E19
E2O (output)

Both
0.00%
0.02%
0.49%
3.74%
8.59%
6.40%
1.57%
0.12%
0.00%
0.00%
0.00%
20.93%

Formulae table
=HYPGEOMDIST($B9,C$3,C$4,C$5)
=CYD9
=SUM(E9:E19)

26

Figure 8.23

Mathematical model for Example 8.8.

Formulae table
=Hypergeo(C3,C4,C5)
C8 (output - p=olp mean) =IF(C7=D7,1,0)

Figure 8.24 Simulation model for Example 8.8.

IF

Chapter 8 Some basic random processes

-

Al

I3

I

C

I Dl

E

H

I F I G I

I

I

I

J

I

K

I

20 1

L

0.3

5
41
42

46

.

48

c4:c22

L-R

0.000%
0.000%

0.15

-

0.1

0.05
Formulae table
=COMBIN(D,S-I)*COMBIN(M-D,M-SY(D-s+I)'
I(COMBIN(M.84-l)'(M-B4-2's+l))
0

$0

20

30

40

50

51

Figure 8.25 Model for Example 8.9.

(b) Each trial is independent of every other, so the number of trials before one success = 1
(1, p) = 1 Geometric(p), where p is the probability calculated from part (a). +

+

+ NegBin

Example 8.9 Playing cards

How many cards in a well-shuffled pack, complete with jokers, do I need to turn over to see a heart?
There are 54 (= M) cards, of which 13 (= D) are hearts, and I am looking for s = 1 heart. The
number of cards I must turn over is given by the formula 1 + InvHyp(1, 13,54), which is the distribution
shown in Figure 8.25. +
Example 8.10 Faulty tyres

A tyre manufacturer has accidentally mixed up four tyres from a faulty batch with 20 other good tyres.
Testing a tyre for the fault ruins it. If each tyre cost $75, and if the tyres are tested one at a time until
the four faulty tyres are found, how much will this mistake cost?
The solution is provided in the spreadsheet model of Figure 8.26. +

8.9.4 Renewal and mixed process problems
In addition to the problems below, Examples 12.8 and 12.9 also deal with renewal and mixed process
problems.
Example 8.1 1 Batteries

A certain brand of batteries lasts Weibull(2,27) hours in my CD player, which takes two batteries at
a time. I have a pack of 10 batteries. For how long can I run my CD player, given that I replace all
batteries when one has run down?
The solution is provided in the spreadsheet model of Figure 8.27. +
Example 8. I 2 Queuing at a bank (Visual Basic modelling with Monte Carlo simulation)

A post office has one counter that it recognises is insufficient for its customer volume. It is considering
putting in another counter and wishes to model the effect on the maximum number in a queue at any
one time. They are open from 9 a.m. to 5 p.m. each working day. Past data show that, when the doors

202

Risk Analysis

tested tyres

Probability

Tyres actually tested

/(COMBIN(M,B4+s-l)'(M-64-s+l
))
=D+Discrete(64:622,C4:C22)

Figure 8.26

Model for Example 8.10.

Figure 8.27

Model for Example 8.11

open at 9 a.m., the number of people waiting to come in will be as shown in Table 8.5. People arrive
throughout the day at a constant rate of one every 12 minutes. The amount of time it takes to serve
each person is Lognormal(29, 23) minutes. What is the maximum queue size in a day?
This problem requires that one simulate a day, monitor the maximum queue size during the day
and then repeat the simulation. One thus builds up a distribution of the maximum number in a queue.
The solution provided in Figures 8.28 and 8.29 and in the following program runs a looping Visual
Basic macro called "Main Program" at each iteration of the model. This is an advanced technique and,
although this problem is very simple, one can see how it can be greatly extended. For example, one
could change the rate of arrival of the customers to be a function of the time of day; one could add

Chapter 8 Some bas~crandom processes

203

Table 8.5 Historic data

on the number of people
waiting at the start of the
business day.

A1

People

Probability

0
1
2
3
4
5

0.6
0.2
0.1
0.05
0.035
0.015

B

C

D

E

F

IG

1

2

lnputs
Average interarrivaltime (mins)
Serving time mean
Serving time stdev

3
4

5
6

7
8
9

--o
l11
12

13
14
15
16
17
18
19

20

12
29
23

Model
People in queue
Time of day (minutesfrom 00:OO:OO)
Latest customer at counter 1
Latest customer at counter 2

outputs
Total customers served
Maximum number in queue

12
1037.92
Customer Arrive time Serving time Finish time
0
997.24
40.68
1037.92
0
1006.16
36.58
1042.74
35
18

Formulae table
C8:C9, C11:E12, C15:C16
Values updated by macro
F11:F12
=El1 + D l 1

21

Figure 8.28

Sheet "model" for the model for Example 8.12.

more counters, and one could monitor other statistical parameters aside from the maximum queue size,
like the maximum amount of time any one person waits or the amount of free time the people working
behind the counter have. 6
V~sualBas~cmacro for Example 8.12
'Set model variables
Dim modelWS As Object
Dim variableWS As Object
Sub Main-Program0
Set modelWS = Workbooks("queue~model~test.xls").Worksheets("model")
Set variableWS = Workbooks("queue~model~test.xls").Worksheets("variables")
'Reset the model with the starting values
modelWS.Range("c9").Value = 9 * 6 0

204

Risk Analysis

A

I

B

C

ID

1 Label
Random Variable
Counter serving time (min)
7.86
3
4 Customers arriving while serving
2

2
5

Wait time for nexi customer

6
--

7
People waiting at 9:00 a.m.

8
--

57.03

0

9
10 Time in last step
11
12

382
X B 4

2B6
16B8
-

17 B10
18

31.76
Formulae table
=Lognorm(Model!$C$4Model!$C$5)
=IF(B10=O,O,Poisson(B1O/Model!C3))
=Expon(Model!C3)

=Discrete({0,1,2,3,4,5},{0.6,0.2~0.1,0.05,0.035,0.015})
Value updated by macro

Figure 8.29 Sheet "variables" for the model for Example 8.12.

'Start serving customers
Serve-First-Customer
Serve-Next-Customer
End Sub
Sub Serve-First-Customer0
'Serve at counter 1 if 0 ppl in queue
If modelWS.Range("c8")= 0 Then
modelWS.Range("c9")= modelWS.Range("c9").Value + variableWS.
Range ( "b6") .Value
modelWS.Range("c8") = 1
Application.Calcu1ate
'MsgBox -wait 1"
Routine-A
End If
'Serve at counter 1 if 1 person in queue
If modelWS.Range("c8")= 1 Then
Rout ine-A
End If
'Serve at counter 1 and 2 if 2 or more ppl in queue
If modelWS.Range("c8")z = 2 Then
Routine-A
Routine-B
End If
End Sub
Sub Serve-Next-Customer()
'Calculate the new time of day
variableWS.Range("bl0")= Evaluate(" = Max(Model!C9,Min(model!Fll,model!F12)) - Model!C9")
modelWS.Range("c8")= modelWS.Range("c8").Value + variableWS.Range("B4").Value
'Calculate the maximum number of people left in queue

Chapter 8 Some basic random processes

modelWS.Range("C16")= Evaluate(" = max(model!cl6,model!c8)" )
Application.Calculate
modelWS.Range("c9")= model~S.Range("c9").Value+ variableWS.Range("B10").Value
Application.Calcu1ate
'MsgBox wait 3 "
'Check how many ppl are in queue
If rnodelWS.Range("c8")= 0 Then
modelWS.Range("c9")= modelWS.Range("c9")
.Value + variableWS
Range ( " b6 " ) .Value
modelWS.Range("c8")= 1
End If
Application.Calcu1ate
If modelWS.Range("c9")> 1020 Then Exit Sub
If modelWS.Range("fl1")c = modelWS.Range("fl2")Then
Routine-A
Else
Routine-B
End If
Application.Calcu1ate
Serve-Next-Customer
End Sub
'Next customer for counter 1
Sub Routine-A ( )
modelWS.Range("cl1")= 1
modelWS.Range("D11")= modelWS.Range("c9")
.Value

Application.Calculate
End Sub
'Next customer for counter 2
Sub Routi n e B ( )
modelWS.Range("cl2")= 1
modelWS.Range("dl2")= modelWS.Range("c9").Value
Application.Calcu1ate
modelWS.Range("el2")= variableWS.Range("B2").Value
modelWS.Range("cl5")= modelWS.Range("cl5")+ 1
modelWS.Range("c8")= modelWS.Range("c8")- 1
modelWS.Range("C12")= 0
Application.Calcu1ate
End Sub

205

Chapter 9

Data and statistics
Statistics is the discipline of fitting probability models to data. In this chapter I go through a number of
basic statistical techniques from the simple z-tests and t-tests of the classical statistics world, through
the basic ideas behind Bayesian statistics and looking at the application of simulation in statistics - the
bootstrap for classical statistics and Markov chain Monte Carlo modelling for Bayesian statistics. If
you have some statistics training you may think my approach is rather inconsistent, as I have no
problems with using Bayesian and classical methods in the same model in spite of the philosophical
inconsistencies between them. That's because classical statistics is still the most readily accepted type
of statistical analysis - so a model using these methods is less contentious among certain audiences, but
on the other hand Bayesian statistics can solve more problems. Moreover, Bayesian statistics is more
consistent with risk analysis modelling because we need to simulate uncertainty about model parameters
so that we can see how that uncertainty propagates through a model to affect our ability to predict the
outputs of interest, not just quote confidence intervals.
There are a few key messages I would like you to take away from this chapter. The first is that statistics
is subjective: the choice of model that we are fitting to our data is a highly subjective decision. Even the
most established statistical tests, like the z-test, t-test, F-test, chi-square test and regression models, have
at their heart the (subjective) assumption that the underlying variable is normally distributed - which
is very rarely the truth. These tests are really old - a hundred years old - and came to be used so
much because one could restructure a number of basic problems into a form of one of these tests and
look up the confidence values in published tables. We don't use tables anymore - well, we shouldn't
anyway - they aren't very accurate and even basic software like Excel can give you the answers directly.
It's rather strange, then, that statistics books often still publish such tables.
The second key message is that statistics does not need to be a black box. With a little understanding
of probability models, it can become quite intuitive.
The third is that there is ample room in statistics for creative thinking. If you have access to simulation
methods, you are freed from having to find the right "test" for your particular problem. Most real-world
problems are too complex for standardised statistical testing.
The fourth is that statistics is intimately related to probability modelling. You won't understand
statistics until you've understood probability theory, so learn that first.
And lastly, statistics can be really quite a lot of fun as well as very informative. It's rare that a person
coming to one of our courses is excited about the statistics part, and I can't blame them, but I like
to think that they change their mind by the end. I studied mathematics and physics at undergraduate
level and came away with really no useful appreciation of statistics, just a solid understanding of how
astonishingly boring it was, because statistics was taught to me as a set of rules and equations, and any
explanation of "Why?'was far beyond what we could hope to understand (at the same time we were
learning about general relativity theory, quantum electrodynamics, etc.).
At the beginning of this book, I discussed the importance of being able to distinguish between
uncertainty (or epistemic uncertainty) and variability (or stochastic uncertainty). This chapter lays out a

208

Risk Analysis

number of techniques that enable one quantitatively to describe the uncertainty (epistemic uncertainty)
associated with the parameters of a model. Uncertainty is a function of the risk analyst, inasmuch as it
is the description of the state of knowledge the risk analyst's clients have about particular parameters
within his or her model.
A quantitative risk analysis model is structured around modelling the variability (randomness) of the
world. However, we have imperfect knowledge of the parameters that define that model, so we must
estimate their values from data, and, because we have only finite amounts of data, there will remain
some uncertainty that we have to layer over our probability model. This chapter is concerned with
determining the distributions of uncertainty for these parameters.
I will assume that the analyst has somehow accumulated a set of data X = { x l ,x2, . . . , x , ) of n data
points that has been obtained in some manner as to be considered a random sample from a random
process. The purpose of this chapter will be to determine the level of uncertainty, given these available
data, associated with some parameter or parameters of the probability model.
It will be useful here to set out some simple terminology:
The estimate of some statistical parameter of the parent distribution with true (but unknown) value,
say p , is denoted by a hat, e.g. b.

.1

n

The sample mean of the dataset X is denoted by f,i.e. ?Z = .n-

xi.
i=l

The (unbiased) sample standard deviation of the dataset X is denoted by s, i.e.

The true mean and standard deviation of the population distribution are denoted by 1 and a respectively.

9.1 Classical Statistics
The classical statistics techniques we all know (or at least remember we were once taught) are the
z-test, t-test and chi-square test. They allow us to estimate the mean and variance of a random variable
for which we have some randomly sampled data, as well as a number of other problems. I'm going to
offer some fairly simple ways of understanding these statistical tests, but I first want to explain why
the "tests" aren't much good to us as risk analysts in their standard form. Let's take a typical t-test
result: it will say something like the true mean = 9.63 with a 95 % confidence interval of [9.32, 9.941,
meaning that we are 95 % sure that the true mean lies between 9.32 and 9.94. It doesn't mean that
there is a 95 % probability that it will lie within these values - it either does or does not, what we are
describing is how well we (the data holders, i.e. it is subjective) know the mean value. In risk analysis
I may have several such parameters in my model. Let's say we have just three such parameters A , B
and C estimated from different datasets and each with its best estimate and 95 % confidence bound. Let
the model be A*BA(l/C). How can I combine these numbers to make an estimate of the uncertainty of
my calculation? The answer is I can't. However, if I could convert the estimates to distributions I could
perform a Monte Carlo simulation and get the answer at any confidence interval, or any percentile the
decision-maker wishes. Thus, we have to convert these classical tests to distributions of uncertainty.

Chapter 9 Data and statistics

209

The classical statistics tests above are based on two basic statistical principles:
1. Thepivotalmethod. This requires that I rearrange an equation so that the parameter being estimated
is separated from any random variable.
2. A sufficient statistic. This means a sample statistic calculated from the data that contains all the
information in the data that is related to estimating the parameter.
1'11 use these ideas to explain the tests above and how they can be converted to uncertainty distributions.

9.1.1 The z-test
The z-test allows us to determine the best estimate and confidence interval for the mean of a normally
distributed population where we happen to know the standard deviation of that population. That would be
quite an unusual situation since the mean is usually more fundamental than the standard deviation, but
does occur sometimes; for example, when we take repeated measures of some quantity (like the length
of a room, the weight of a beam). In this situation the random variable is not the length of the room,
etc., but the results we will get. Look at the manual of a scientific measuring instrument and it should
tell you the accuracy (e.g. f1 mm). Sadly, the manufacturers don't usually tell us how to interpret these
values - will the measurement lie within 1mm of the true value 68 % (1 standard deviation), 95 % (two
standard deviations), etc., of the time? If the instrument manual were to say the measurement error has
a standard deviation of 1 mm, we could apply the z-test.
Let's say we are measuring some fixed quantity and that we take n such measurements. The sample
mean is given by the formula

Here T is the sufficient statistic. If the errors are normally distributed with mean p and standard deviation
a we have

Note how we have managed to rearrange the equation to place the random element Normal(0,l) apart
from the parameter we are trying to estimate. Now, thanks to the pivotal method, we can rearrange to
make p the focus:

In the z-test we would have specified a confidence interval, say 95 %, and then looked up the "z-score"
values for a Normal(0,l) distribution that would correspond to 2.5 % and 97.5 % (i.e. centrally positioned
values with 95 % between them) which are -1.95996 and +1.95996 respectively.' Then we'd write

p~ = 1.95996-

'

c7

fi

+x

You can get these values with ModelRisk using VoseNormal(0, 1, 0.025) and VoseNormal(0, 1, 0.975) or in Excel with =
NORMSINV(0.025) and = NORMSINV(0.975).

2 10

Risk Analysis

to get the lower and upper bounds respectively. In a risk analysis simulation we just use

9.1.2 The chi-square test
The chi-square (X2)test allows us to determine the best estimate and confidence interval for the standard
deviation of a normally distributed population. There are two situations: we either know the mean p or
we don't. Knowing the mean seems like an unusual scenario but happens, for example, when we are
calibrating a measuring device against some known standard. In this case, the formula for the sample
variance is given by

The sample variance in this case is the sufficient statistic for the population variance. Rewriting to
get a pivotal quantity, we have

However, the sum of n unit normal distributions squared is the definition of a chi-square distribution.
Rearranging, we get

A X2(n) distribution has mean n, so this formula is simply multiplying the sample variance by a
random variable with mean 1. The chi-square test finds, say, the 2.5 an 97.5 percentiles2 and inserts
them into the above equation. For example, these percentiles for 10 degrees of freedom are 3.247 and
20.483. Since we are dividing by the chi-square random variable, the upper estimate corresponds to the
lower chi-square value, and vice versa:

In risk analysis modelling we would instead simulate values for a using Equation 9.2:

In ModelRisk use VoseChiSq(n, 0.005) and VoseChiSq(n, 0.975), and in Excel use CHIINV(0.975, n) and CHIINV(0.025, n)
respectively.

Chapter 9 Data and statistics

211

Now let's consider what happens when we don't know the population mean, in which case statistical
convention says that we use a slightly different formula for the sample variance measure:

However, for a normal distribution it turns out that

Rearranging, we get

9.1.3 The t-test
The t-test allows us to determine the best estimate and confidence interval for the mean of a normally
distributed population where we don't know its standard deviation. From Equation 9.1 we had the result

when the population variance was known, and from Equation 9.2 we had the estimate for the variance
when the mean is unknown:

Substitute for a and we get

The definition of a Student(v) distribution is a normal distribution with mean 0 and variance following
a random variable v/ChiSq(v), so we have

2 12

Risk Analysis

Knowing that the Student t-distribution is just a unit normal distribution with some randomness
about the variance explains why a Student distribution has longer tails than a normal. The Student(v)
distribution has variance v/(v - 2), v > 2, so at v = 3 the variance is 3 and rapidly decreases, so that
by v = 30 it is only 1.07 (a standard deviation of 1.035) and for v = 50 a standard deviation of 1.02.
The practical implication is that, when you have, say, 50 data points, there is only a 2 % difference
in the confidence interval range whether you use a t-test (Equation 9.4) or approximate with a z-test
(Equation 9.1), using the sample variance in place of n2.

9.1.4 Estimating a binomial probability or a proportion
In many problems we need to determine a binomial probability (e.g. the probability of a flood in a
certain week of the year) or a proportion (e.g. the proportion of components that are made to a certain
tolerance). In estimating both, we collect data. Each measurement point is a random variable that has a
probability p of having the characteristic of interest. If all measurements are independent, and we assign
a value to the measurement of 1 when the measurement has the characteristic of interest and 0 when
it does not, the measurements can be thought of as a set of Bernoulli trials. Letting P be the random
variable of the proportion of n of this set of trials {Xi}
that have the characteristic of interest, it will
take a distribution given by

We observe a proportion of the n trials that have the characteristic of interest p, our one observation
from the random variable P which is also our MLE (see later) and unbiased estimate for p. Switching
around Equation (9.9, we can get an uncertainty distribution for the true value of p:

We shall see later how this exactly equates to the non-parametric and parametric bootstrap estimates of
a binomial probability. Equation (9.6) is a bit awkward since it will allow only (n 1) discrete values for
p , i.e. (0, l l n , 2 / n , . . . , (n - l ) / n , 11, whereas our uncertainty about p should really take into account
all values between zero and 1. However, a Binomial(n, 8) has a mean and standard deviation given by

+

and, from the central limit theorem, as n gets large the proportion of observations P will tend to a
normal distribution, in which case Equation (9.6) can be rewritten as

Equation (9.6) gives us what is known as the "exact binomial confidence interval", which is an awful
name in my view because it actually gives us bounds for which we have a t least the required confidence
that the true value of p lies within. We never use this method. Another classical statistics method is
to construct a cumulative uncertainty distribution, which is far more useful. We start by saying that, if

Chapter 9 Data and statistics

2 13

True value of probability p

I

Figure 9.1

Cumulative distributions of estimate of p for n = 10 trials and varying number of successes s.

we've observed s successes in n trials, the confidence that the true value of the probability is less than
some value x is given by

where Y = Binomial(n, x). In Excel we would write

By varying the value x from 0 to 1, we can construct the cumulative confidence. For example, Figure 9.1
shows examples with n = 10.
This is an interesting method. Look at the scenario for s = 0: the cumulative distribution starts with a
value of 50 % at p = 0, so it is saying that, with no successes observed, we have 50 % confidence that
there is no binomial process at all - trials can't become successes, and the remaining 50 % confidence
is distributed over p = (0, 1). The reverse logic applies where s = n. In ModelRisk we have a function
VoseBinomialP(s, n, ProcessExists, U), where you input the successes s and trials n and, in the situation
where s = 0 or n, you have the option to specify whether you know that the probability lies within
(0, 1) (ProcessExists = TRUE). The U-parameter also allows you to specify a cumulative percentile - if
omitted, the function simulates random values of what the value p might be. So, for example:
VoseBinomialP(10,20, TRUE, 0.99) = VoseBinomialP(10,20, FALSE, 0.99) = 0.74605
VoseBinomialP(O,20, TRUE, 0.99) = 0.02522 (it assumes that p cannot be zero)
VoseBinomialP(O,20, FALSE, 0.4) = 0 (it allows that p could be zero)

9.1.5 Estimating a Poisson intensity
In a Poisson process, countable events occur randomly in time or space - like earthquakes, financial
crashes, car crashes, epidemics and customer arrivals. We need to estimate the base rate h at which
these events occur. So, for example, a city of 500 000 people may have had a! murders last year: perhaps
that was unluckily high, or luckily low. We'd like to know the degree of accuracy that we can place

2 14

Risk Analysis

around the statement "The risk is a murders per year". Following a classical statistics approach similar
to section 9.1.4, we could write

where 1 refers to the single year of counting.
We could recognise that a Poisson(a!) distribution has mean and variance = a and looks normal when
a! is large:

The method suffers the same problems as the binomial: if we haven't yet observed any murders this
year, the formulae don't work. A classical statistics alternative is again to construct the cumulative
confidence distribution using

Figure 9.2 shows some examples of the cumulative distribution that can be constructed from this
formula. In ModelRisk there is a function VosePoissonLarnbda(a, t , ProcessExists, U )where you input
the counts a! and the time over which they have been observed t , and in the situation where a = 0 you
have the option to specify whether you know that the intensity is non-zero (ProcessExists = TRUE).
The U-parameter also allows you to specify a cumulative percentile - if omitted, the function simulates
random values of what the value h might be. So, for example:
VosePoissonLambda(2,3, TRUE, 0.2) = VosePoissonLambda(2,3, FALSE, 0.4)

VosePoissonLambda(0, 3, TRUE, 0.2) = 0.203324 (it assumes that h cannot be zero)
VoseBinomialP(O,20, FALSE, 0.2) = 0 (it allows that h could be zero)

0

5
10
15
True value of Poisson intensity h

20

Figure 9.2 Cumulative distributions of estimate of h for varying number of observations a.

Chapter 9 Data and stat~st~cs 2 15

9.2 Bayesian Inference
The Bayesian approach to statistics has enjoyed something of a renaissance over the latter half of the
twentieth century, but there still remains a schism among the scientific community over the Bayesian
position. Many scientists, and particularly many classically trained statisticians, believe that science
should be objective and therefore dislike any methodology that is based on subjectivism. There are, of
course, a host of counterarguments. Experimental design is subjective to begin with; classical statistics are
limited in that they make certain assumptions (normally distributed errors or populations, for example)
and scientists have to use their judgement in deciding whether such an assumption is sufficiently well
met; moreover, at the end of a statistical analysis one is often asked to accept or reject a hypothesis by
picking (quite subjectively) a level of significance (p values).
For the risk analyst, subjectivism is a fact of life. Each model one builds is only an approximation
of the real world. Decisions about the structure and acceptable accuracy of the risk analyst's model are
very subjective. Added to all this, the risk analyst must very often rely on subjective estimates for many
model inputs, frequently without any data to back them up.
Bayesian inference is an extremely powerful technique, based on Bayes' theorem (sometimes called
Bayes' formula), for using data to improve one's estimate of a parameter. There are essentially three
steps involved: (1) determining a prior estimate of the parameter in the form of a confidence distribution;
(2) finding an appropriate likelihood function for the observed data; (3) calculating the posterior (i.e.
revised) estimate of the parameter by multiplying the prior distribution and the likelihood function, then
normalising so that the result is a true distribution of confidence (i.e. the area under the curve equals 1).
The first part of this section introduces the concept and provides some simple examples. The second
part explains how to determine prior distributions. The third part looks more closely at likelihood
functions, and the fourth part explains how normalising of the posterior distribution is carried out.
?

9.2.1 Introduction
Bayesian inference is based on Bayes' theorem (Section 6.3.5), the logic of which was first proposed
in Bayes (1763). Bayes' theorem states that

We will change the notation of that formula for the purpose of explaining Bayesian inference to a
notation often used in the Bayesian world:

Bayesian inference mathematically describes the learning process. We start off with an opinion, however
vague, and then modify our opinion when presented with evidence. The components of Equation (9.8) are:
n(8) - the "prior distribution". n(6) is the density function of our prior belief about the parameter
value 6 before we have observed the data x . In other words, n(6) is not a probability distribution of
0 but rather an uncertainty distribution: it is an adequate representation of the state of our knowledge
about 6 before the data x were observed.

2 16

Risk Analysis

1(x 16) - the "likelihood function". 1(x 18) is the calculated probability of randomly observing the
data x for a given value of 8. The shape of the likelihood function embodies the amount of
information contained in the data. If the information it contains is small, the likelihood function
will be broadly distributed, whereas if the information it contains is large, the likelihood function
will be very focused around some particular value of the parameter. However, if the shape of the
likelihood function corresponds strongly to the prior distribution, the amount of extra information
the likelihood function embodies is relatively small and the posterior distribution will not differ
greatly from the prior. In other words, one would not have learned very much from the data. On
the other hand, if the shape of the likelihood function is very different from the prior we will have
learned a lot from the data.
f (81~) the "posterior distribution". f (81x1 is the description of our state of knowledge of 8 after
we have observed the data x and given our opinion of the value of 8 before x was observed.

The denominator in Equation (9.8) simply normalises the posterior distribution to have a total area
equal to 1. Since the denominator is simply a scalar value and not a function of 8, one can rewrite
Equation (9.8) in a form that is generally more convenient:

The cc symbol means "is proportional to", so this equation shows that the value of the posterior distribution density function, evaluated at some value of 8, is proportional to the product of the prior
distribution density function at that value of 8 and the likelihood of observing the dataset x if that
value of 8 were the parameter's true value. It is interesting to observe that Bayesian inference is thus
not interested in absolute values of the prior and likelihood function, but only their shapes. In writing
equations of the form of Equation(9.9), we are taking as read that one will eventually have to normalise
the distribution.
Bayesian inference seems to confuse a lot of people rather quickly. I have found that the easiest way
to understand it, and to explain it, is through examples.
Example 9.1

I have three "loonies7' (Canadian one dollar coins - they have a loon on the tail face) in my pocket.
Two of them are regular coins, but the third is a weighted coin that has a 70 % chance of landing heads
up. I cannot tell the coins apart on inspection. I take a coin out of my pocket at random and toss it - it
lands heads up. What is the probability that the coin is the weighted coin?
Let's start by noting that the probability, as I have defined the term probability in Chapter 6.2, that
the coin is the weighted one is either 0 or 1: it either is not the weighted coin or it is. The problem
should really be phrased "What confidence do I have that the tossed coin is weighted?', as I am only
dealing with the state of my knowledge. When I took the coin out of my pocket but before I had tossed
it, I would have said I was confident that the coin in my hand was weighted, and confident it
was not weighted. My prior distribution n(f3)for the state of the coin would thus look like Figure 9.3,
i.e. a discrete distribution with two allowed values {not weighted, weighted) with confidences
respectively.
Now I toss the coin and it lands heads up. If the coin were fair, it would have a probability of of
landing that way. My confidence that I took out a fair coin from my pocket and then tossed a head (call
1
1
it scenario A) is therefore proportional to my prior belief multiplied by the likelihood, i.e. * 3 = 5 .
On the other hand, I am also confident that the coin could have been weighted, and then it would

5

(5,4)

5

-

-

Chapter 9 Data and stat~stlcs 2 17

Figure 9.3 Prior distribution for the weighted coin example: a Discrete ((0,I},

{g, 4)).

have had a probability of
of landing that way. My confidence that I took out the weighted coin from
my pocket and then tossed a head (call it scenario B) is therefore proportional to f *
=
The two values 112 and 7/10 used for the probability of observing a head were conditional on the
type of coin that was being tossed. These two values represent, in this problem, the likelihood function.
We will look at some more general likelihood functions in the following examples.
Now, we know that one of scenarios A and B must have actually occurred since we did observe a
head. We must therefore normalise my confidence for these two scenarios so that they add up to 1, i.e.

6. 6.

This normalising is the purpose of the denominator in Equation (9.8).
I am now 10117 confident that the coin is fair and 7/17 confident that it is weighted: I still think it
more likely I tossed a fair coin than a weighted coin. Let us imagine that we toss the coin again and
observe another head. How would this affect my confidence distribution of the state of the coin? Well,
the posterior confidence of selecting a fair coin and observing two heads (scenario C) is proportional
to 2 *
* = The posterior confidence of selecting the weighted coin and observing two heads
(scenario D) is proportional to 113 * 7/10 * 7/10 = 491300. Normalising these two, we get

i.

Now I am roughly equally confident about whether I had tossed a fair or a weighted coin. Figure 9.4
depicts posterior distributions for the above example, plus the posterior distributions for a few more
tosses of a coin where each toss resulted in a head. One can see that, as the amount of observations
(data) we have grows, our prior belief gets swamped by what the data say is really possible, i.e. by the
information contained in the data. +

-

2 18

Risk Analysis

I

1

2 heads in 2 tosses

3 heads in 3 tosses

1

5 heads in 5 tosses

1

0.495

weiohted

I

4 heads in 4 tosses

not weiahted

I

Figure 9.4

weiahted

10 heads in 10 tosses

I

Posterior distributions for the coin tossing example with increasing numbers of heads.

Example 9.2

A game warden on a tropical island would like to know how many tigers she has on her island. It
is a big island with dense jungle and she has a limited budget, so she can't search every inch of the
island methodically. Besides, she wants to disturb the tigers and the other fauna as little as possible.
She arranges for a capture-recapture survey to be carried out as follows.
Hidden traps are laid at random points on the island. The traps are furnished with transmitters that
signal a catch, and each captured tiger is retrieved immediately. When 20 tigers have been caught, the
traps are removed. Each of these 20 tigers are carefully sedated and marked with an ear tag, then all
are released together back to the positions where they were originally caught. Some short time later,
hidden traps are laid again, but at different points on the island, until 30 tigers have been caught and
the number of tagged tigers is recorded. Captured tigers are held in captivity until the 30th tiger has
been caught.
The game warden tries the experiment, and seven of the 30 tigers captured in the second set of traps
are tagged. How many tigers are there on the island?
The warden has gone to some lengths to specify the experiment precisely. This is so that we will
be able to assume within reasonable accuracy that the experiment is taking a hypergeometric sample
from the tiger population (Section 8.4). A hypergeometric sample assumes that an individual with the
characteristic of interest (in this case, a tagged tiger) has the same probability of being sampled as
any individual that does not have that characteristic (i.e. the untagged tigers). The reader may enjoy

Chapter 9 Data and statistics

2 19

thinking through what assumptions are being made in this analysis and where the experimental design
has attempted to minimise any deviation from a true hypergeometric sampling.
We will use the usual notation for a hypergeometric process:

n - the sample size, = 30.
D - the number of individuals in the population of interest (tagged tigers) = 20.
M - the population (the number of tigers in the jungle). In the Bayesian inference terminology, this
is given the symbol 8 as it is the parameter we are attempting to estimate.
x - the number of individuals in the sample that have the characteristic of interest = 7.
We could get a best guess for M by noting that the most likely scenario would be for us to see tagged
tigers in the sample in the same proportion as they occur in the population. In other words
x
D .
7
20
x - 1.e. - x - which gives M
n
M'
30
M'

-

% 85

to 86

but this does not take account of the uncertainty that occurs owing to the random sampling involved in
the experiment. Let us imagine that before the experiment was started the warden and her staff believed
that the number of tigers was equally likely to be any one value as any other. In other words, they
knew absolutely nothing about the number of tigers in the jungle, and their prior distribution is thus a
discrete uniform distribution over all non-negative integers. This is rather unlikely, of course, but we
will discuss better prior distributions in Section 9.2.2.
The likelihood function is given by the probability mass function of the hypergeometric distribution, i.e.

~ ( ~ 1=
8 0,
)

I

i
I

otherwise

The likelihood function is 0 for values of 0 below 43, as the experiment tells us that there must be
at least 43 tigers: 20 that were tagged plus the (30 - 7) that were caught in the recapture part of the
experiment and were not tagged.
The probability mass function (Section 6.1.2) applies to a discrete distribution and equals the probability that exactly x events will occur. Excel provides a convenient function HYPGEOMDIST(x, n , D, M)
that will calculate the hypergeometric distribution mass function automatically, but generates errors
instead of zero when 0 < 43 so I have used the equivalent ModelRisk function. Figure 9.5 illustrates a
spreadsheet where a discrete uniform prior, with values of 0 running from 0 to 150, is multiplied by the
likelihood function above to arrive at a posterior distribution. We know that the total confidence must
add up to 1, which is done in column F to produce the normalised posterior distribution. The shape
of this posterior distribution is shown in Figure 9.6 by plotting column B against column F from the
spreadsheet. The graph peaks at a value of 85, as we would expect, but it appears cut off at the right
tail, which shows that we should also look at values of 0 larger than 150. The analysis is repeated for
values of 8 up to 300, and this more complete posterior distribution is plotted in Figure 9.7. This second
plot represents a good model of the state of the warden's knowledge about the number of tigers on the
island. Don't forget that this is a distribution of belief and is not a true probability distribution since
there is an exact number of tigers on the island.

220

Risk Analysis

C3:C6
B10:B117
CIO:C117
D10:D117
E10:E117
E7
F10:FI 17

Formulae table
constants
(43,..,150)
1
=VoseHypergeoProb(x,n,D,B9)
=DIO*C10
=SUM(EIO:EI 17)
=EIO/$E$7

Figure 9.5

Bayesian inference model for the tiger capture-release-recapture problem.

Figure 9.6

First pass at a posterior distribution for the tagged tiger problem.

In this example, we had to adjust our range of tested values of 8 in light of the posterior distribution. It
is quite common to review the set of tested values of 0, either expanding the prior's range or modelling
some part of the prior's range in more detail when the posterior distribution is concentrated around a
small range. It is entirely appropriate to expand the range of the prior as long as we would have been
happy to have extended our prior to the new range before seeing the data. However, it would not be
appropriate if we had a much more informed prior belief that gave an absolute range for the uncertain
parameter outside of which we are now considering stepping. This would not be right because we would
be revising our prior belief in light of the data: putting the cart before the horse, if you like. However,
if the likelihood function is concentrated very much at one end of the range of the prior, it may well be
worth reviewing whether the prior distribution or the likelihood function is appropriate, since the analysis
could be suggesting that the true value of the parameter lies outside the preconceived range of the prior.
Continuing with our tigers on an island, let us imagine that the warden is unsatisfied with the level of
uncertainty that remains about the number of tigers, which, from 50 to 250, is rather large. She decides
to wait a short while and then capture another 30 tigers. The experiment is completed, and this time t
tagged tigers are captured. Assuming that a tagged tiger still has the same probability of being captured
as an untagged tiger, what is her uncertainty distribution now for the number of tigers on the island?

Chapter 9 Data and statistics

50

Figure 9.7

100
150
200
Tiaers on the island

250

300

22 1

/

Improved posterior distribution for the tagged tiger problem.

This is simply a replication of the first problem, except that we no longer use a discrete uniform
distribution as her prior. Instead, the distribution of Figure 9.7 represents the state of her knowledge
prior to doing this second experiment, and the likelihood function is now given by the Excel function
HYPGEOMDIST(t, 30, 20, 0), equivalently VoseHypergeoProb(t, 30, 20, 8 , 0). The six panels of
Figure 9.8 show what the warden's posterior distribution would have been if the second experiment had
trapped t = 1, 3, 5 , 7, 10 and 15 tagged tigers instead. These posteriors are plotted together with the
prior of Figure 9.7 and the likelihood functions, normalised to sum to 1 for ease of comparison.
You might initially imagine that performing another experiment would make you more confident about
the actual number of tigers on the island, but the graphs of Figure 9.8 show that this is not necessarily
so. In the top two panels the posterior distribution is now more spread than the prior because the data
contradict the prior (the prior and likelihood peak at very different values of 0). In the middle left panel,
the likelihood disagrees moderately with the prior, but the extra information in the data compensates
for this, leaving us with about the same level of uncertainty but with a posterior distribution that is to
the right of the prior.
The middle right panel represents the scenario where the second experiment has the same results
as the first. You'll see that the prior and likelihood overlay on each other because the prior of the
first experiment was uniform and therefore the posterior shape was only influenced by the likelihood
function. Since both experiments produced the same result, our confidence is improved and remains
centred around the best guess of 85.
In the bottom two panels, the likelihood functions disagree with the priors, yet the posterior distributions have a narrower uncertainty. This is because the likelihood function is placing emphasis on the
left tail of the possible range of values for 0, which is bounded at 0 = 43.
In summary, the graphs of Figure 9.8 show that the amount of information contained in data is
dependent on two things: (1) the manner in which the data were collected (i.e. the level of randomness
inherent in the collection), which is described by the likelihood function, and (2) the state of our
knowledge prior to observing the data and the degree to which it compares with the likelihood function.
If the data tell us what we are already fairly sure of, there is little information contained in the data for
us (though the data would contain much more information for those more ignorant of the parameter).
On the other hand, if the data contradict what we already know, our uncertainty may either decrease or
increase, depending on the circumstances.

+

222

Risk Analysis

1 tagged tiger

g

3 tagged tigers

0 016

0 016

0 014

0.01 4

0 012

8

0012

001

E'

0.01

C

g

0008

6c

0

0.008

0 006

0 006

0 004

0 004

0 002

0 002

0

0
0

50

100

150

200

250

300

0

Tigers on the island

0

50

100

150

200

0.025

250

Tigers on the island

150

200

250

300

7 tagged tigers

300

Tigers on the island

10 tagged tigers

100

Tigers on the island

5 tagged tigers

0.018

50

Tigers on the island

I

I

0.08 T

I

15 tagged tigers

Tigers on the island

Figure 9.8 Tagged tiger problem: (a), (b), (c), (d ), (e) and (f) show prior distributions, likelihood functions and
posterior distributions if the second experiment had trapped 1, 3, 5, 7, 10 and 15 tigers tagged respectively
(prior distribution shown as empty circles, likelihood function as grey lines and posterior distributions as
black lines).

Chapter 9 Data and statistics

223

Example 9.3

Twenty people are randomly picked off a city street in France. Whether they are male or female is noted
on 20 identical pieces of paper, put into a hat and the hat is brought to me. I have not seen these 20
people. I take out five pieces of paper from the hat and read them - three are female. I am then asked
to estimate the number of females in the original group of 20.
I can express my estimate as a confidence distribution of the possible values. I might argue that, prior
to reading the five names, I had no knowledge of the number of people who would be female and so
would assign a discrete uniform prior from 0 to 20. However, it would be better to argue that roughly
50 % of people are female and so a much better prior distribution would be a Binomial(20, 0.5). This is
equivalent to a Duniform prior, followed by a Binomial(20, 0.5) likelihood for the number of females
that would be randomly selected from a population in a sample of 20.
The likelihood function relating to sampling five people from the population is again hypergeometric,
except in this problem we know the total population (i.e. M = 20), we know the sample size (n = 5 )
and we know the number observed in the sample with the required property (x = 3), but we don't
know the number of females D, which we denote by 0 as it is the parameter to be estimated. Figure 9.9
illustrates the spreadsheet model for this problem, using the binomial distribution prior. This spreadsheet
has made use of ModelRisk's VoseBinomialProb(x, n, p , cumulative), equivalently the Excel function
BINOMDIST(x, n, p, cumulative), which returns a probability evaluated at x for a Binomial(n, p)
distribution. The cumulative parameter in the function toggles the function to return a probability mass
(cumulative = 0 or FALSE) or a cumulative probability (cumulative = 1 or TRUE). The IF statement in
Cells C8:C28 is unnecessary because the VoseHypergeoProb function will return a zero, but necessary
to avoid errors if you use Excel's HYPGEOMDIST function in its place.
Figure 9.10 shows the resultant posterior distribution, together with the likelihood function and the
prior. Here we can see that the prior is very strong and the amount of information imbedded in the likelihood function is small, so the posterior distribution is quite close to the prior. The posterior distribution
is a sort of compromise between the prior and likelihood function, in that it finds a distribution that
agrees as much as possible with both. Hence, the peak of the posterior distribution now lies somewhere
between the peaks of the prior and likelihood function. The effect of the likelihood function is small
A(

I

1

4
5
-

:

(

B

C

(

D

E

9
10

11
12

25
26
27
28
29
30

Figure 9.9

F

(GI

H

1

I

1J

Parameters

6

7
8
-

1

9

0
1
2
3
4
17
18
19
20

Prior
9.5E-07
1.9E-05
1.8E-04
l.lE-03
4.6E-03
1.1E-03
1.8E-04
1.9E-05
9.5E-07

Likelihood
0
0
0
8.8E-03
3.1 E-02
1.3E-01
5.3E-02
0
0

Posterior
0
0
0
9.5E-06
1.4E-04
1.4E-04
9.5E-06
0
0
0.3125

Normalised
posterior
0
0
0
3.1E-05
4.6E-04
4.6E-04
3.1E-05
0
0

C3:C4
B8:B28
C8:C28
D8:D28
E8:E28
E29
F8:F28

Formulae table
Constants
{O,l,. . .,19,20}
=VoseBinomialProb(B8,20,0.5,0)
=IF(OR(BB20-(n-x))
,O,VoseHypergeoProb(x,n,B8,20))
=C8*D8
=SUM(E8:E28)
=E8/$E$29

Bayesian inference model for the number of "females in a hat" problem.

224

Risk Analysis

+ Prtor

- - L~kel~hood
functlon (norrnal~sed)
Posterlor

+-

Females in the group

Figure 9.10 Prior distribution, likelihood function and posterior distribution for the model of Figure 9.9 using
a Binomial(20, 0.5) prior.

-$-Prior

+

Posterior

Females in the arouD

Figure 9.1 1 Prior and posterior distributions for the model of Figure 9.8 with a Duniform(0, . . ., 20) prior

because the sample is small (a sample of 5 ) and because it does not disagree with the prior (the prior
has a maximum at 8 = 10, and this value of 8 also produces one of the highest likelihood function
values).
For comparison, Figure 9.11 shows the prior and posterior distributions if one had used a discrete
uniform prior. Since the prior is flat in this case, it contributes nothing to the posterior's shape and the
likelihood function becomes the posterior distribution. +

Chapter 9 Data and statistics

225

Hyperparameters

I assumed in Example 9.3 that the prevalence of females in France is 50 %. However, knowing that
females on average live longer than males, this figure will be a slight underestimate. Perhaps I should
have used a value of 5 1 % or 52 %. In Bayesian inference, I can include uncertainty about one or more of
the parameters in the analysis. For example, I could model p with a PERT(5O %, 51 %, 52 %). Uncertain
parameters are called hyperparameters. In the algebraic form of a Bayesian inference calculation, I then
integrate out this nuisance parameter which in reality can be a bit tricky to carry out. Let's look again
at the Bayesian inference calculation in the spreadsheet of Figure 9.9. If I have uncertainty about the
prevalence of females p, I should assign a distribution to its value, in which case there would then
be uncertainty about the posterior distribution. I cannot have an uncertainty about my uncertainty: it
doesn't make sense. This is why we must integrate out (i.e. aggregate) the effect of uncertainty about
p on the posterior distribution. We can do this very easily using Monte Carlo simulation, instead of the
more onerous algebraic integration. We simply include a distribution for p in our model, nominate the
entire array for the posterior distribution as an output and simulate. The set of means of the generated
values for each cell in the array constitutes the final posterior distribution.
Simulating a Bayesian inference calculation

We could have done the same Bayesian inference analysis for Example 9.3 by simulation. Figure 9.12
illustrates a spreadsheet model that performs the Bayesian inference, together with a plot of the model
result. In cell C3, a Binomial(20,0.5) distribution represents the prior. It is randomly generating possible

A

1

I

B

IDI

c

E

I GJ

F

1
Formulae table

=IF(C3=O,O,VoseHypergeo(5,C3,20))
6
7
8
9
10
11
12

0.196

I

I

I

-

0.157
al

----'

0.118

2
15
16
17
18
19
20
21
22
-

I
I

, - - - - - - - --------------->
--------;
I

I
I

I
I

- - - - - - - J-------: --------

,

,
,- - - - - - -,- - - - - - - -

I
I

I
I
I r - ' - - - - -I r - - - - - - - . - - - - - - - - - - - - -

I
I
I

I

I

I

I

g 0,078--------:---------------.--------------4-------;-------.
0
,
II

I
I

I
I

___.___.___________---------------~IIIII
I
I
I

I
4

6

8

10

12

Females in group

23

Figure 9.12

4

Simulation model for the problem of Figure 9.9.

14

I

1

16

18

226

Risk Analysis

scenarios of the number of "females" in the hat. In cell C4 a sample of five people is modelled using
a Hypergeo(5, D, 20), where D is the result from the binomial distribution. The IF statement here is
unnecessary because VoseHypergeo supports D = 0 but, for example, @RISK'S RiskHypergeo(5,0,20)
returns an error. This represents one-half of the likelihood function logic. Finally, in cell C5, the
generated value from the binomial distribution in cell C3 is accepted (and therefore stored in memory)
if the hypergeometric distribution produces a 3 - the number of females observed in the experiment. This
is equivalent to the second half of the likelihood function logic. By running a large number of iterations,
a large number of generated values from the binomial will be accepted. The proportion of times that a
particular value from the binomial distribution will be accepted equates to the hypergeometric probability
that three females would be subsequently observed in a random sample of five from the group. I ran
this model for 100000 iterations, and 31 343 values were accepted, which equates to about 31 % of
the iterations. The technique is interesting but does have limited applications, since, for more complex
problems or those with larger numbers, the technique becomes very inefficient as the percentage of
iterations that are accepted becomes very small indeed. It is also difficult to use where the parameter
being estimated is continuous rather than discrete, in which case one is forced to use a logic that accepts
the generated prior value if the generated result lies within some range of the observed result. However,
to combat this inefficiency, one can alter the prior distribution to generate values that the experimental
results have shown to be possible. For example, in this problem, there must be between three and 18
females in the group of 20, whereas the Binomial(20, 0.5) is generating values between 0 and 20.
Furthermore, one could run several passes, cutting down the prior with each pass to home in on only
those values that are feasible. One can also get more detail in the tails by multiplying up the mass of
some values x , y, z (for example, in the tails of the prior) by some factor, then dividing the heights of
the posterior tail at x , y and z by that factor.
While this technique consumes a lot of simulation time, the models are very simple to construct and
one can also consider multiple parameter priors.
Let us look again at the choice of priors for this problem, i.e. either a Dunifom((0, . . . , 20)) or a
Binomial(20, 50 %). One might consider that the Dunifom distribution is less informed (i.e. says less)
than the binomial distribution. However, we can turn the Duniform distribution around and ask what
that would have said about our prior belief of the probability p of a person randomly selected from the
French population being female. We can show that a uniform assumption for p translates to a Duniform
distribution of females in a group, as follows.
Let s, be the number of successes in n Bernoulli trials where 8 is the unknown probability of success
of a trial. Then the probability that s, = r, r = {O, 1, 2, . . . , n), is given by the de Finetti theorem:

where f (8) is the probability density function for the uncertainty distribution for 8. The formula simply
calculates, for any value of r , the binomial probability

of observing r successes, integrated over the uncertainty distribution for the binomial probability 8. If
we use a Uniform(0, 1) distribution to describe our uncertainty about 8, then f (8) = 1 :

Chapter 9 Data and statistics

227

The integral is a beta function and, for integer values of r and n , we have the standard identity

Thus,
P(, =r ) =

@)( (nn-+r l) )! !r !

-

r

n!
(n - r)!r!- 1
! -r)! (n 1
n 1

+

+

+

+

So each of the n 1 possible values (0, 1 , 2 , . . . , n } has the same likelihood of ll(n 1). In other
words, using a Duniform prior for the number of females in a group equates to saying that we are
equally confident that the true probability of an individual from the population being female is any
value between 0 and 1.
Example 9.4

I
h

&

I
a

A magician has three cups turned over on his table. Under one of the cups you see him put a pea.
With much ceremony, he changes the cups around in a dazzling swirl. He then offers you a bet to pick
which cup the pea is under. You pick one. He then shows you under one of the other cups - empty.
The magician asks you whether you would like to swap your choice for the third, untouched cup. What
is your answer? Note that the magician knows which cup has the pea and would not turn it over.
In this problem, until the magician turns over a cup, we are equally sure about which cup has the
pea so our prior confidence assigns equal weighting to the three cups. We now need to calculate the
probability of what was observed if the pea had been under each of the cups in turn. We can label the
three cups as A for the cup I chose, B for the cup the magician chose and C for the remaining cup.
Let's start with the easy cup, B. What is the probability that the magician would turn over cup B if
he knew the pea was under cup B? Answer: 0, because he would have spoiled the trick.
Next, look at the untouched cup, C. What is the probability that the magician would turn over cup
B if he knew the pea was under cup C? Answer: 1, since he had no choice as I had already picked A,
and C contained the pea.
Now look at my cup, A. What is the probability that the magician would turn over cup B if he knew
the pea was under cup A? Answer: 112, since he could have chosen to turn over either B or C.
Thus, from Bayes' theorem,

5

where P ( A ) = P ( B ) = P ( C ) = are the confidences we assign to the three cups before observing the
data X (i.e. the magician turning over cup B) and P(XI A ) = 0.5, P(XI B ) = 0 and P ( X IC) = 1.
Thus,

P(A1X) =

4,

P(B1X) = 0

and P(C1X) = $

So, after having made our choice of cup and then watching the magician turn over one of the other
two cups, we should always change our mind and pick the third cup as we should now be twice as
confident that the untouched cup will contain the pea as the one we originally chose. The result is a

228

Risk Analysis

little hard for many people to believe in: the obstinate among us would like to stick to our original
choice, and it does not seem that the probability can really have changed for the cup we chose to contain
the pea. Indeed, the probability has not changed after the magician's selection: it remains either 0 or
1, depending on whether we picked the right cup. What has changed is our confidence (the state of
our knowledge) about whether that probability is 1. Originally, we had a 113 confidence that the pea
was under our cup, and that has not changed. There is another way to think of the same problem: we
had 113 confidence in our original choice of cup, and 213 in the other choices, and we also knew that
one of those other cups did not contain the pea, so the 213 migrated to the remaining cup that was
not turned over. This exercise is known as the Monte Hall problem - Wikipedia has a nice explanatory
page, and www.stat.sc.edu/-west/javahtml/letsMakeaDeal.htmlhas a nice simulation applet to test out
the answer. +
Exercise 9.1: Try repeating this problem where there are (a) four cups and one pea, and (b) five
cups and two peas. Each time you get to select a cup, and each time the magician turns one of the
others over.

9.2.2 Prior distributions
As we have seen above, the prior distributions are the description of one's state of knowledge about
the parameter in question prior to observation of the data. Determination of the prior distribution is
the primary focus for criticism of Bayesian inference, and one needs to be quite sure of the effects
of choosing one particular prior over another. This section describes three different types of prior
distribution: the uninformed prior; the conjugate prior and the subjective prior. We will look at the
practical reasons for selecting each type and arguments for and against each selection.
An argument presented by frequentist statisticians (i.e. those who use only traditional statistical techniques) is that the Bayesian inference methodology is subjective. A frequentist might argue that, because
we use prior distributions, representing the state of one's belief prior to accumulation of data, Bayesian
inference may easily produce quite different results from one practitioner to the next because they can
choose quite different priors. This is, of course, true - in principle. It is both one of the strengths and certainly an Achilles' heel of the technique. On the one hand, it is very useful in a statistical technique to be
able to include one's prior experience and knowledge of the parameter, even if that is not available in a
pure data form. On the other hand, one party could argue that the resultant posterior distribution produced
by another party was incorrect. The solution to this dilemma is, in principle, fairly simple. If the purpose
of the Bayesian inference is to make internal decisions within your organisation, you are very much at
liberty to use any experience you have available to determine your prior. On the other hand, if the result
of your analysis is likely to be challenged by a party with a conflicting agenda to your own, you may
be better off choosing an ''uninformed" prior, i.e. one that is neutral in that it provides no extra information. All that said, in the event that one has accumulated a reasonable dataset, the controversy regarding
selection of priors disappears as the prior is overwhelmed by the information contained in the data.
It is important to specify a prior with a sufficiently large range to cover all possible true values for the
parameter, as we have seen in Figure 9.6. Failure to specify a wide enough prior will curtail the posterior
distribution, although this will nearly always be apparent when plotting the posterior distribution and a
correction can be made. The only time it may not be apparent that the prior range is inadequate is when
the likelihood function has more than one peak, in which case one might have extended the range of
the prior to show the first peak but no further.

Chapter 9 Data and statistics

229

Uninformed priors

An uninformed prior has a distribution that would be considered to add no information to the Bayesian
inference, except to specify the possible range of the parameter in question. For example, a Uniform(0,
1) distribution could be considered an uninformed prior when estimating a binomial probability because
it states that, prior to collection of any data, we consider every possible value for the true probability to
be as likely as every other. An uninformed prior is often desirable in the development of public policy
to demonstrate impartiality. Laplace (1812), who also independently stated Bayes' theorem (Laplace,
1774) 1l years after Bayes' essay was published (he apparently had not seen Bayes' essay), proposed
that public policy priors should assume all allowable values to have equal likelihood (i.e. uniform or
Duniform distributions).
At first glance, then, it might seem that uninformed priors will just be uniform distributions running across the entire range of possible values for the parameter. That this is not true can be easily
demonstrated from the following example. Consider the task of estimating the true mean number of
events per unit exposure h of a Poisson process. We have observed a certain number of events within
a certain period, which we can use to give us a likelihood function very easily (see Example 9.6). It
might seem reasonable to assign a Uniform(0, z ) prior to h, where z is some large number. However,
we could just as easily have parameterised the problem in terms of B, the mean exposure between
events. Since B = 1/h, we can quickly check what a Uniform(0, z) prior for h would look like as a
prior for B by running a simulation on the formula: = l/Uniform(O, z). Figure 9.13 shows the result of
such a simulation. It is alarmingly far from being uninformed with respect to B ! Of course, the reverse
equally applies: if we had performed a Bayesian inference on B with a uniform prior, the prior for h
would be just as far from being uninformed. The probability density function for the prior distribution
of a parameter must be known in order to perform a Bayesian inference calculation. However, one can
often choose between a number of different pararneterisations that would equally well describe the same

Figure 9.13

Distribution resulting from the formula: = 1 /Uniform(O, 20).

230

Risk Analysis

stochastic process. For example, one could describe a Poisson process by h, the mean number of events
per unit exposure, by 6, the mean exposure between events as above, or by P(x > O), the probability
of at least one event in a unit of exposure.
The Jacobian transformation lets us calculate the prior distribution for a Bayesian inference problem
after reparameterising. If x is the original parameter with probability density function f (x) and cumulative distribution function F(x), and y is the new parameter with probability density function f (y)
and cumulative distribution function F ( y ) related to x by some function such that x and y increase
monotonically, then we can equate changes d F ( y ) and dF(x) together, i.e.

Rearranging a little, we get

known as the Jacobian.
So, for example, if x = Uniform(0, c) and y = l l x ,

ax
-

a~

= - 1 j y 2 sothe Jacobianis

ax

= 11y2

1
which gives the distribution for y : p(y) = T.
cY

Two advanced exercises for those who like algebra:
Exercise 9.2: Suppose we model p = U(0, 1). What is the density function for Q = 1 - (1 - p)"?

Exercise 9.3: Suppose we want to model P ( 0 ) = exp(-A) = U(0, 1). What is the density function
for h?
There is no all-embracing solution to the problem of setting uninformed priors that don't become
"informed" under some reparameterising of the problem. However, one useful method is to use a
prior such that loglo(8) is Uniform(-z, z) distributed, which, using Jacobian transformation, can be
shown to give the prior density n(8) a 118, for a parameter that can take any positive real value.
We could just as easily use natural logs, i.e. loge(8) = Uniform(-y, y), but in practice it is easier
to set the value z because our minds think quite naturally in powers of 10. Using this prior, we get
log,,(1/8) = - loglo(0) = -Uniform(-z, z) = Uniform(-z, z). In other words, 118 is distributed the
same as 8: in mathematical terminology, the prior distribution is transformation invariant. Now, if

Chapter 9 Data and statistics

Figure 9.14

23 1

Prior distribution n(9)= 118.

logl,(8) is Uniform(-z, r ) distributed, then 6' is distributed as 10Uniform(-z3z).Figure 9.14 shows a
graph of n(8) = 118. You probably wouldn't describe that distribution as very uninformed, but it is
arguably the best one can do for this particular problem. It is worth remembering too that, if there is a
reasonable amount of data available, the likelihood function 1(X (8)will overpower the prior n(8) = 118,
and then the shape of the prior becomes unimportant. This will occur much more quickly if the likelihood
function is a maximum in a region of 8 where the prior is flatter: anywhere from 3 or 4 onwards in
Figure 9.14, for example.
Another requirement might be to ensure that the prior distribution remains invariant under some
rescaling. For example, the location parameter of a distribution should have the same effective prior
under the linear shifting transformation y = 8 - a , where a is some constant. This is achieved if we
select a uniform prior for 8, i.e. n(8) = constant. Similarly, a scale parameter should have a prior that
is invariant under a change of units, i.e. y = k0, where k is some constant. In other words, we require
that the parameter be invariant under a linear transformation which, from the discussion in the previous
paragraph, is achieved if we select the prior log(8) = uniform (i.e. n(8) cx 118) on the real line, since
log(y) = log(k8) = log(k) log(8), which is still uniformly distributed.
Parametric distribution often has either or both a location parameter and a scale parameter. If more
than one parameter is unknown and one is attempting to estimate these parameters, it is common practice
to assume independence between the two parameters in the prior: the logic is that an assumption of
independence is more uninformed than an assumption of any specific degree of dependence. The joint
prior for a scale parameter and a location parameter is then simply the product of the two priors. So, for
example, the prior for the mean of a normal distribution is n ( p ) cx 1, as p is a location parameter; the
prior for the standard deviation of the normal distribution is n ( a ) cx l / a , as a is a scale parameter, and
their joint prior is given by the product of the two priors, i.e. n ( p , a ) oc l/a. The use of joint priors is
discussed more fully in Chapter 10 where we will be fitting distributions to data.

+

Jeffreysprior

The Jeffreys prior, described in Jeffreys (1961), provides an easily computed prior that is invariant under
any one-to-one transformation and therefore determines one version of what could be described as an
uninformed prior. The idea is that one finds a likelihood function, under some transformation of the
data, that produces the same shape for all datasets and simply changes the location of its peak. Thus, a
non-informative prior in this translation would be ambiguous, i.e. flat. Although it is often impossible

to determine such a likelihood function, Jeffreys developed a useful approximation given by

where I(8) is the expected Fisher information in the model:

The formula is averaging, over all values of x (the data), the second-order partial derivative of
the loglikelihood function. The form of the likelihood function is helping determine the prior, but the
data themselves are not. This is important since the prior must be "blind" to the data. [Interestingly,
empirical Bayes methods (another field of Bayesian inference though not discussed in this book) do use
the data to determine the prior distribution and then try to make appropriate corrections for the bias this
creates.]
Some of the Jeffreys prior results are a little counterintuitive. For example, the Jeffreys prior for a
binomial probability is the ~ e t a ( i i, ) shown in Figure 9.15. It peaks at p = 0 and p = 1, dipping to its
lowest value at p = 0.5, which does not equate well with most people's intuitive notion of uninformed.
~ . using the Jacobian transformation, we
The Jeffreys prior for the Poisson mean h is n(h) a l / ~ ' / But,
see that this gives a prior for p = 1/h of p(B) a pV3I2,SO the prior is not transformation invariant.
Improper priors

We have seen how a uniform prior can be used to represent uninformed knowledge about a parameter.
However, if that parameter can take on any value between zero and infinity, for example, then it is not
strictly possible to use the uniform prior n(O) = c, where c is some constant, since no value of c will
let the area of the distribution sum to 1, and the prior is called improper. Other common improper priors
include using l/a for the standard deviation of a normal distribution and l/a2 for the variance. It turns
out that we can use improper priors provided the denominator in Equation (9.8) equals some constant
(i.e. is not infinite), because this means that the posterior distribution can be normalised.

Figure 9.15 The

eta($, $) distribution.

Chapter 9 Data and stat~st~cs 23 3

Savage et al. (1962) has pointed out that an uninformed prior can be uniformly distributed over the
area of interest, then slope smoothly down to zero outside the area of interest. Such a prior can, of
course, be designed to have an area of 1, eliminating the need for improper priors. However, the extra
effort required in designing such a prior is not really necessary if one can accept using an improper
prior.
Hyperprion

I

Occasionally, one may wish to specify a prior that itself has one or more uncertain parameters. For
instance, in Example 9.3 we used a Binomial(20, 0.5) prior because we believed that about 50 % of the
population were female, and we discussed the effect of changing this value to a distribution representing
the uncertainty about the true female prevalence. Such a distribution is described as a hyperprior for
the hyperparameter p in Binomial(20, p). As previously discussed, Bayesian inference can account
for hyperpriors, but we are then required to do an integration over all values of the hyperparameter
to determine the shape of the prior, and that can be time consuming and at times very difficult. An
alternative to the algebraic approach is to find the prior distribution by Monte Carlo simulation. We
run a simulation for this model, naming as outputs the array of cells calculating the prior. At the end
of the simulation, we collect the mean values for each output cell, which together form our prior. The
posterior distribution will naturally have a greater spread if there is uncertainty about any parameters in
the prior. If we had used a Beta(a, 6 ) distribution for p, the prior would have been a Beta-Binomial(20,
a , b) distribution and a beta-binomial distribution always has a greater spread than the best-fitting
binomial.
Theoretically, one could continue applying uncertainty distributions to the parameters of hyperpriors,
etc., but there is little if any accuracy to be gained by doing so, and the model starts to seem pretty silly.
It is also worth remembering that the likelihood function often quickly overpowers the prior distribution
as more data become available, so the effort expended in subtle changes to defining a prior will often
be wasted.
Conjugate priors

A conjugate prior has the same functional form in 6 as the likelihood function which leads to a
posterior distribution belonging to the same distribution family as the prior. For example, the Beta(al,
a2) distribution has probability density function f (8) given by

The denominator is a constant for particular values of

a1

and a2, so we can rewrite the equation as

If we had observed s successes in n trials and were attempting to estimate the true probability of
success p, the likelihood function l(s, n; 8) would be given by the binomial distribution probability
mass function written (using 8 to represent the unknown parameter p) as

Since the binomial coefficient

is constant for the given dataset (i.e. known n, s), we can rewrite

the equation as

We can see that the beta distribution and the binomial likelihood function have the same functional
,
a and b are constants. Since the posterior distribution is a product of
form in 0, i.e. Ha(l - o ) ~ where
the prior and likelihood function, it too will have the same functional form, i.e. using Equation (9.9)
we have
f (HIS, n) a Hffl-l+S (1 - 6)az-l+n-s
Since this is a true distribution, it must normalise to 1, so the probability distribution function is actually
QUI

f(els, n> =

+

-I+$ (1 - ~)ffz-l+n-s

J" tff,-l+s(1 - t)ff2-l+n-s dt

+

which is just the Beta(a1 s , a2 n - s) distribution. (In fact, with a bit of practice, one starts to
recognise distributions because of their functional form, e.g. that Equation (9.10) represents a beta
distribution, without having to go through the step of obtaining the normalised equation.) Thus, if one
uses a beta distribution as a prior for p with a binomial likelihood function, the posterior distribution
is also a beta. The value of using conjugate priors is that we can avoid actually doing any of the
mathematics and get directly to the answer. Conjugate priors are often called convenience priors for
obvious reasons.
The Beta(1, 1) distribution is exactly the same as a Uniform(0, 1) distribution, so, if we want to start
with a Uniforrn(0, 1) prior for p, our posterior distribution is given by Beta(s 1, n - s 1). This is
a particularly useful result that will be used repeatedly in this book. By comparison, the Jeffreys prior
for a binomial probability is a ~ e t a ( i , Haldane (1948) discusses using a Beta(0, 0) prior, which
is mathematically undefined and therefore meaningless by itself, but gives a posterior distribution of
Beta(s, n - s) that has a mean of s/n: in other words, it provides an unbiased estimate for the binomial
probability.
Table 9.1 lists other conjugate priors and the associated likelihood functions. Morris (1983) has shown
that exponential families of distributions, from which one often draws the likelihood function, all have
conjugate priors, so the technique can be used frequently in practice. Conjugate priors are also often
used to provide approximate but very convenient representations to subjective priors, as described in
the next section.

+

+

i).

Subjective priors

A subjective prior (sometimes called an elicited prior) describes the informed opinion of the value of a
parameter prior to the collection of data. Chapter 14 discusses in some depth the techniques for eliciting
opinions. A subjective prior can be represented as a series of points on a graph, as shown in Figure 9.16.
It is a simple enough exercise to read off a number of points from such graphs and use the height of
each point as a substitute for n(0). That makes it quite difficult to normalise the posterior distribution,

23 5

Chapter 9 Data and statistics
--

--

Table 9.1 Likelihood functions and their associated conjugate priors.
Distribution

Probability density
function

Estimated
parameter

Prior

Posterior

+x

Binomial

Probabilityp

Beta(al, a2)

a; = a1

Exponential

Mean-' = h

Gamma(a, B)

a;=a2+n-x
a' = a n

+

B' =

B

1+BCxi

Normal (with
known a)

Poisson

1
-exp

&a

e-At

1 x-p

2]

Mean p

Mean events
per unit time I

22

24

Normal(p,,

a),

p', =

p,(oz;n)
a2/n

(ht)Y
x!

20

Figure 9.16

[- (0)

26

28
30
32
34
Weight of statue

Gamma(a, B)

+~

a i

+ a:

a' = a + x
B
B' = 1 +Bt

36

38

40

Example of a subjective prior.

but we will see in Section 9.2.4 a technique that one can use in Monte Carlo modelling that removes
that problem.
Sometimes it is possible reasonably to match a subjective opinion like that of Figure 9.16 to a
convenience prior for the likelihood function one is intending to use. Software products like ModelRisk,
~ e s t ~ i and
t @ RiskView pro@ can help in this regard. An exact match is not usually important because
(a) the subjective prior is not usually specified that accurately anyway and (b) the prior has progressively
less influence on the posterior the larger the set of data used in calculating the likelihood function. At
other times, a single conjugate prior may be inadequate for describing a subjective prior, but a composite
of two or more conjugate priors will produce a good representation.

236

Risk Analysis

Multivariate priors

I have concentrated discussion on the quantification of uncertainty in this chapter to a single parameter
8. In practice one may find that 8 is multivariate, i.e. that it is multidimensional, in which case one
needs multivariate priors. In general, such techniques are beyond the scope of this book, and the reader
is referred to more specialised texts on Bayesian inference: I have listed some texts I have found useful
(and readable) in Appendix IV. Multivariate priors are, however, discussed briefly with respect to fitting
distributions to data in Section 10.2.2.

9.2.3 Likelihood functions
The likelihood function l(X 10) is a function of 8 with the data X fixed. It calculates the probability of
observing the X observed data as a function of 8. Sometimes the likelihood function is simple: often it is
just the probability distribution function of a distribution like the binomial, Poisson or hypergeometric.
At other times, it can quickly become very complex.
Examples 9.2, 9.3 and 9.6 to 9.8 illustrate some different likelihood functions. As likelihood functions
are calculating probabilities (or probability densities), they can be combined in the same way as we
usually do in probability calculus, discussed in Section 6.3.
The likelihood principle states that all relevant evidence about 8 from an experiment and its observed
outcome should be present in the likelihood function. For example, in binomial sampling with n fixed,
s is binomially distributed for a given p. If s is fixed, n is negative binomially distributed for a given
p . In both cases the likelihood function is proportional to p S ( l - P)"-~, i.e. it is independent of how
the sampling was carried out and dependent only on the type of sampling and the result.

9.2.4 Normalising the posterior distribution
A problem often faced by those using Bayesian inference is the difficulty of determining the normalising
integral that is the denominator of Equation (9.8). For all but the simplest likelihood functions this
can be a complex equation. Although sophisticated commercial software products like ~athematica',
~ a t h c a d ' and ~ a p l e ' are available to perform these equations for the analyst, many integrals remain
intractable and have to be solved numerically. This means that the calculation has to be redone every
time new data are acquired or a slightly different problem is encountered.
For the risk analyst using Monte Carlo techniques, the normalising part of the Bayesian inference analysis can be bypassed altogether. Most Monte Carlo packages offer two functions that enable us to do this:
a Discrete({x}, {p}) distribution and a Relative(min, max, {x), {p}). The first defines a discrete distribution where the allowed values are given by the { x } array and the relative likelihood of each of these values
is given by the { p } array. The second function defines a continuous distribution with a minimum = min,
a maximum = max and several x values given by the array {x), each of which has a relative likelihood
"density" given by the {p} array. The reason that these two functions are so useful is that the user is not
required to ensure that for the discrete distribution the probabilities in {p) sum to 1 and for the relative
distribution the area under the curve equals 1 . The functions normalise themselves automatically.

9.2.5 Taylor series approximation to a Bayesian posterior distribution
When we have a reasonable amount of data with which to calculate the likelihood function, the posterior
distribution tends to come out looking approximately normally distributed. In this section we will

Chapter 9 Data and statistics

237

examine why that is, and provide a shorthand method to determine the approximating normal distribution
directly without needing to go through a complete Bayesian analysis.
Our best estimate O0 of the value of a parameter 8 is the value for which the posterior distribution
f (8) is at its maximum. Mathematically, this equates to the condition

That is to say, 80 occurs where the gradient of f (0) is zero. Strictly speaking, we also require that the
gradient o f f (0) go from positive to negative for Bo to be a maximum, i.e.

The second condition is only of any importance if the posterior distribution has two or more peaks,
for which a normal approximation to the posterior distribution would be inappropriate anyway. Taking
the first and second derivatives of f (8) assumes that 8 is a continuous variable, but the principle applies
equally to discrete variables, in which case we are just looking for that value of 8 for which the posterior
distribution has the highest value.
The Taylor series expansion of a function (see Section 6.3.6) allows one to produce a polynomial
approximation to some function f (x) about some value xo that usually has a much simpler form than
the original function. The Taylor series expansion says

where f (m)(x)represents the mth derivative of f (x) with respect to x.
To make the next calculation a little easier to manage, we first define the log of the posterior distribution L(8) = log,[ f (8)]. Since L(8) increases with f ( G ) , the maximum of L(8) occurs at the same
value of 8 as the maximum of f (8). We now apply the Taylor series expansion of L(8) about 80 (the
MLE) for the first three terms:

The first term in this expansion is just a constant value (k) and tells us nothing about the shape of L(8);
the second term equals zero from Equation (9.1 I), so we are left with the simplified form

This approximation will be good providing the higher-order terms (m = 3, 4, etc.) have much smaller
values than the m = 2 term here.

238

Risk Analysis

We can now take the exponential of L(0) to get back to f (0):
f (0) x K exp

lQo

(i

- d2:)-

(0 - o0)2)

where K is a normalising constant. Now, the Normal(p, a ) distribution has probability density function
f (x) given by

Comparing the above two equations, we can see that f (0) has the same functional form as a normal
distribution, where

p = Oo

and

[

a = - ----

and we can thus often approximate the Bayesian posterior distribution with the following normal distribution:

We shall illustrate this normal (or quadratic) approximation with a few simple examples.
Example 9.5 Approximation t o the beta distribution

+

+

We have seen above that the beta distribution (s 1, n - s 1) provides an estimate of the binomial probability p when we have observed s successes in n independent trials, and assuming a prior
Uniform(0, 1) distribution. The posterior density has the function
f (0) a 0"1 Taking logs gives

and
dL(0)
-d0

s
0

---

n -s d 2 ~ ( 0) -s -n-s
1-0'

do2

O2

We first find our best estimate 80 of 0

s

n-s

=0

(1 - 0)2

Chapter 9 Data and statistics

239

which gives the intuitively encouraging answer
60'

=s/n

i.e. our best guess for the binomial probability is the proportion of trials that were successes.
Next, we find the standard deviation a for the normal approximation to this beta distribution:
d2L ( 0 )
o do2 (

s

-

6

n
n-s
-0
2
&(I-%)

which gives

and so we get the approximation

0

-

Normal

(., [

0o(l - 00)
]'I2)

The equation for a allows us some useful insight into the behaviour of the beta distribution. We can
see in the numerator that the spread of the beta distribution, and therefore our measure of uncertainty
about the true value of 6 , is a function of our best estimate for 0. The function [00(1- Oo)] is at its
maximum when 80 = - so, for a given set of trials n , we will be more uncertain about the true value of
6 if the proportion of successes is close to than if it were closer to 0 or 1. Looking at the denominator,
we see that the degree of uncertainty, represented by (T, is proportional to n-'I2. We will see time and
again that the level of uncertainty of some parameter is inversely proportional to the square root of the
amount of data available. Note also that Equation (9.14) is exactly the same as the classical statistics
result of Equation (9.7).But when is this quadratic approximation to L ( 0 ) ,i.e. the normal approximation
to f (O),a reasonably good fit? The mean p and variance V of a Beta(s 1, n - s 1 ) distribution are
as follows:

i,

+

+

Comparing these identities with Equation (9.13), we can see that the normal approximation works when
s and (n - s ) are both sufficiently large for adding 1 to s and adding 3 to n proportionally to have little
effect, i.e. when

s+l

s

1

and

n+3
n

-- x 1

Figure 9.17 compares the beta distribution with its normal approximation for several combinations of
s successes in n trials. +

240

Risk Analysis

Example 9.6 Uncertainty of h in a Poisson process

The number of earthquakes that have occurred in a region of the Pacific during the last 20years are
shown in Table 9.2. What is the probability that there will be more than 10 earthquakes next year?
Let us assume that the earthquakes come from a Poisson process (it probably doesn't, I admit, since
one big earthquake can release built-up pressure and give a hiatus until the next one), i.e. that there is
a constant probability per unit time of an earthquake and that all earthquakes are independent of each

Chapter 9 Data and statistics

24 1

Table 9.2 Pacific earthquakes.
Year

Earthquakes

1979
1980
1981
1982
1983
1984
1985
1986
1987
1988

Year

Earthquakes

1989
1990
1991
1992
1993
1994
1995
1996
1997
1998

other. If such an assumption is acceptable, then we need to determine the value of the Poisson process
parameter h, the theoretical true mean number of earthquakes there would be per year. Assuming no
prior knowledge, we can proceed with a Bayesian analysis, labelling h = 0 as the parameter to be
estimated. The prior distribution should be uninformed, which, as discussed in Section 9.2.2, leads
us to use a prior n(0) = 110. The likelihood function Z(01X) for the xi observations in n years is
given by

which gives a posterior function

Taking logs gives

Our best estimate O0 is determined by

which gives

242

Risk Analysis

and the standard deviation for the normal approximation is given by

since

which gives our estimate for h:

Again this solution makes sense, and again we see that the uncertainty decreases proportional to the
square root of the amount of data n. The central limit theorem (see Section 6.3.3) says that, for large
n, the uncertainty about the true mean v of a population can be described as

where Y is the mean and s is the standard deviation of the data sampled from the parent distribution.
The Poisson distribution has a variance equal to its mean h, and therefore a standard deviation equal to
As
gets large, so the "-1" in the above formula for 80 gets progressively less important
and 80 gets closer and closer to the mean of the observations per period 2,and we see that the Bayesian
approach and the central limit theorem of classical statistics converge to the same answer. C;=lxi will
be large when either h is large, so each xi is large, or when there are a lot of data (i.e. n is large), so
that the sum of a lot of small xi is still large. Figure 9.18 provides three estimates of A, the true mean
number of earthquakes for the system, given the data for earthquakes for the last 20 years, namely: the
standard Bayesian approach, the normal approximation to the Bayesian and the central limit theorem
approximation. +

a. x;=pxi

Example 9.7 Estimate of the mean of a normal distribution with unknown standard deviation

Assume that we have a set of n data samples from a normal distribution with unknown mean p and
unknown standard deviation a.We would like to determine our best estimate of the mean together
with the appropriate level of uncertainty. A normal distribution can have a mean anywhere in [-oo,
+m], so we could use a uniform improper prior n ( p ) = k. From the discussion in Section 9.2.2, the
uninformed prior for the standard deviation should be n(a)= l/a to ensure invariance under a linear
transformation. The likelihood function is given by the normal distribution density function:

C h a ~ t e 9r D a t a and stat~stics 243

Figure 9.18

Uncertainty distributions for h by various methods.

Multiplying the priors together with the likelihood function and integrating over all possible values of
a, we arrive at the posterior distribution for p:

where F and s are the mean and sample standard deviation of the data values.
Now the Student t-distribution with u degrees of freedom has the probability density

The equation for f ( p ) is of the same form as the equation for f (x) if we set u = n - 1. If we divide
the term inside the square brackets for f ( p ) by the constant ns2, we get

so the equation above for f ( p ) equates to a shifted, rescaled Student t-distribution with (n - 1) degrees
of freedom. Specifically, p can be modelled as

where t (n - 1) represents the Student t-distribution with (n - 1) degrees of freedom. This is the exact
result used in classical statistics, as described in Section 9.1.3. +
Example 9.8 Estimate of the mean of a normal distribution with known standard deviation

This is a more specific case than the previous example and might occur, for example, if one was
making many measurements of the same parameter but believed that the measurements had independent,

244

Risk Analysis

normally distributed errors and no bias (so the distribution of possible values would be centred about
the true value).
We proceed in exactly the same way as before, giving a uniform prior for p and using a normal
likelihood function for the observed n measurements { x i } .No prior is needed for a since it is known,
and we arrive at a posterior distribution for p given by

Taking logs gives

i.e., since a is known,

where k is some constant. Differentiating twice, we get

The best estimate po of p is that value for which !&JJ~= 0:
dLL

i.e. po is the average of the data values Y - no surprise there! A Taylor series expansion of this function
about po gives

The second term is missing because it equals zero and there are no other higher-order terms since
( d 2 ~ ( p ) / d p 2 )= (-n/a2) is independent of p and any further differential therefore equals zero. Consequently, Equation (9.16) is an exact result.

Chapter 9 Data and stat~stics 245

Taking natural exponents to convert back to f (p), and rearranging a little, we get

where K is a normalising constant. By comparison with the probability density function for the normal
distribution, it is easy to see that this is just a normal density function with mean ? and standard
deviation a/&. In other words

which is the classical statistics result of Equation (9.4) and a result predictable from the central limit
theorem.

+

Exercise 9.4: Bayesian uncertainty for the standard deviation of a normal distribution.
Show that the Bayesian inference results of uncertainty about the standard deviation of a normal
distribution take a similar form to the classical statistics results of Section 9.1.2.

9.2.6 Markov chain simulation: the Metropolis algorithm and the Gibbs
sampler
Gibbs sampling is a simulation technique to obtain a required Bayesian posterior distribution and is
particularly useful for multiparameter models where it is difficult algebraically to define, normalise and
draw from a posterior distribution. The method is based on Markov chain simulation: a technique that
creates a Markov process (a type of random walk) whose stationary distribution (the distribution of
the values it will take after a very large number of steps) is the required posterior distribution. The
technique requires that one runs the Markov chain a sufficiently large number of steps to be close to
the stationary distribution, and then records the generated values. The trick to a Markov chain model
is to determine a transition distribution ~,(0'10'-') (the distribution of possible values for the Markov
chain at its ith step 8 ' , conditional on the value generated in the (i - 1)th step oi-') that converges to
the posterior distribution.
The metropolis algorithm

The transition distribution
is a combination of some symmetric jumping distribution ~ ~ (l0'-'),
0'
which lets one move from one value 0'-' to another randomly selected 0*, and a weighting function
that assigns the probability of jumping to 0* (as opposed to staying still) as the ratio r , where

so that
0' = 0 * with probability min[l , r ]
- 0i-1 otherwise

246

Risk Analysis

The technique relies on being able to sample from J, for all i and 0'-', as well as being able to
calculate r for all jumps. For multiparameter problems, the Metropolis algorithm is very inefficient: the
Gibbs sampler provides a method that achieves the same posterior distribution but with far fewer model
iterations.
The Gibbs sampler

The Gibbs sampler, also called alternating conditional sampling, is used in multiparameter problems,
i.e. where 6 is a d-dimensional vector with components (dl, . . . , Od). The Gibbs sampler cycles through
all the components of 6 for each iteration, so there are d steps in each iteration. The order in which the
components are taken is changed at random from one iteration to the next. In a cycle, the kth component
is replaced (k = 1 to d, while all of the other components are kept fixed in turn) with a value drawn
from a distribution with probability density

where df_i' are all the other components of 6 except for Ok at their current value. This may look rather
awkward as one has to determine and sample from d separate distributions for each iteration of the
Gibbs sampler. However, the conditional distributions are often conjugate distributions, which makes
sampling from them a lot simpler and quicker. Have a look at Gelman et al. (1995) for a very readable
discussion of various Markov chain models, and for a number of examples of their use. Gilks et al.
(1996) is written by some of the real gurus of MCMC methods.
M C M C in practice

Some terribly smart people write their own Gibbs sampling programs, but for the rest of us there is a
product called WinBUGS developed originally at Cambridge University. It is free to download and the
software most used for MCMC modelling. It isn't that easy to get the software to work for you unless
you are familiar with S-plus or R type script, and one always waits with baited breathe for the message
"Compiled successfully" because there is rather little in the way of hints about what to do when it
doesn't compile. On the plus side, the actual probability model is quite intuitive to write and WinBUGS
has the flexibility to allow different datasets to be incorporated into the same model. The software is
also continuously improving, and several people have written interfaces to it through the OpenBUGS
project. To use the WinBUGS output, you will need to export the CODA file for data (after a sufficient
burn-in) to a spreadsheet, move the data around to have one column per parameter and then randomly
sample across a line (i.e. one MCMC iteration) in just the same way I explain for bootstrapping paired
data. The ModelRisk function VoseNBootPaired allows you to do this very simply.

9.3 The Bootstrap
The bootstrap was introduced by Efron (1979) and is explored in great depth in Efron and Tibshirani
(1993) and perhaps more practically in Davison and Hinkley (1997). This section presents a rather brief
introduction that covers most of the important concepts. The bootstrap appears at first sight to be rather
dubious, but it has earned its place as a useful technique because (a) it corresponds well to traditional
techniques where they are available, particularly when a large dataset has been obtained, and (b) it offers
an opportunity to assess the uncertainty about a parameter where classical statistics has no technique
available and without recourse to determining a prior.

Chapter 9 Data and statistics

247

The "bootstrap" gets its name from the phrase "to pull yourself up by your bootstraps", which is
thought to originate from one of the tales in the Adventures of Baron Munchausen by Rudolph Erich
Raspe (1737- 1794). Baron Munchausen (1720-1797) actually existed and was known as an enormous
boaster, especially of his exploits during his time as a Russian cavalry officer. Raspe wrote ludicrous
stories supposedly in his name (he would have been sued these days). In one story, the Baron was at the
bottom of a deep lake and in some trouble, until he thought of pulling himself up by his bootstraps. The
name "bootstrap" does not perhaps engender much confidence in the technique: you get the impression
that there is an attempt somehow to get something from nothing - actually, it does seem that way when
one first looks at the technique itself. However, the bootstrap has shown itself to be a powerful method
of statistical analysis and, if used with care, can provide results very easily and in areas where traditional
statistical techniques are not available.
In its simplest form, which is the non-parametric bootstrap, the technique is very straightforward
indeed. The standard notation as used by Efron is perhaps a little confusing, though, to the beginner,
and, since I am not going into any great sophistication in this book, I have modified the notation a little
to keep it as simple as possible. The bootstrap is used in similar conditions to Bayesian inference, i.e.
we have a set of data x randomly drawn from some population distribution F for which we wish to
estimate some statistical parameter.
The jackknife

The bootstrap was originally developed from a much earlier technique called the jackknife. The jackknife
was used to review the accuracy of a statistic calculated from a set of data. A jackknife value was the
statistic of interest calculated with the ith value removed from the dataset and is given the notation
With a dataset of n values, one thus has n jackknife values, the distribution of which gives a feel
for the uncertainty one has about the true value of the statistic. I say "gives a feel" because the reader
is certainly not recommended to use the jackknife as a method for obtaining any precise estimate of
uncertainty. The jackknife turns out to be quite a poor estimation of uncertainty and can be greatly
improved upon.

9.3.1 The non-parametric bootstrap
Imagine that we have a set of n random measurements of some characteristic of a population (the
height of 100 blades of grass from my lawn, for example) and we wish to estimate some parameter
of that population (the true mean height of all blades of grass from my lawn, for example). Bootstrap
theory says that the true distribution F of these blades of grass can be reasonably approximated by the
distribution p of observed values. Obviously, this is a more reasonable assumption the more data one
has collected. The theory then constructs this distribution k of the n observed values and takes another
n random samples (with replacement) from that constructed distribution and calculates the statistic of
interest from that sample. The sampling from the constructed distribution and statistic calculation is
repeated a large number of times until a reasonably stable distribution of the statistic of interest is
obtained. This is the distribution of uncertainty about the parameter.
The method is best illustrated with a simple example. Imagine that I work for a contact lens manufacturer in Auckland and for some reason would really like to know the mean diameter of the pupils of
the eyes of New Zealand's population under some specific light condition. I have a limited budget, so
I randomly select 10 people off the street and measure their pupils while controlling the ambient light.
The results I get are (in mm): 5.92, 5.06, 6.16, 5.60, 4.87, 5.61, 5.72, 5.36, 6.03 and 5.71. This dataset
forms my bootstrap estimate of the true distribution for the whole population, so I now randomly sample

248

Risk Analysis

B4:B13
C4:C13

Formulae table
Data values
=VoseDUniform($B$4:$B$l3)
=AVERAGE(C4:C13)

Figure 9.19 Example of a non-parametric bootstrap model.

0.163

0.131

E 0.098
0

'IJ
L'

5

0

0.065

.........................

0.033

0.000
5.00

5.20

5.40
5.60
Mean pupil diameter (mm)

5.80

6.00

Figure 9.20 Uncertainty distribution resulting from the model of Figure 9.19.

with replacement from the distribution to get 10 bootstrap samples. The spreadsheet in Figure 9.19
illustrates the bootstrap sampling: column B lists the original data and column C gives 10 bootstrap
samples from these data using the Duniform({x}) distribution (Duniform({x}) is a discrete distribution
where all values in the {x} array are equally likely). Cell C14 then calculates the statistic of interest
(the mean) from this sample. Running a 10000 iteration simulation on this cell produces the bootstrap
uncertainty distribution shown in Figure 9.20. The distribution is roughly normal (skewness = -0.16,
kurtosis = 3.02) with mean = 5.604 - the mean of the original dataset.

Chapter 9 D a t a and statistics

249

In summary, the non-parametric bootstrap proceeds as follows:
Collect the dataset of n samples (XI,. . . , x,}.
Create B bootstrap samples {xl*,. . . , x,*}where each xi* is a random sample with replacement from
{XI,...x,I.
For each bootstrap sample {xy,. . . , x;), calculate the required statistic 8. The distribution of these
B estimates of 8 represents the bootstrap estimate of uncertainty about the true value of 8.
Example 9.9 Bootstrap estimate of prevalence

Prevalence is the proportion of a population that has a particular characteristic. An estimate of the
prevalence P is usually made by randomly sampling from the population and seeing what proportion
of the sample has that particular characteristic. Our confidence around this single-point estimate can be
obtained quite easily using the non-parametric bootstrap. Imagine that we have randomly surveyed 50
voters in Washington, DC, and asked them how many will be voting for the Democrats in a presidential
election the following day. Let's rather nayvely assume that they all tell the truth and that none of
them will have a change of mind before tomorrow. The result of the survey is that 19 people said they
will vote Democrat. Our dataset is therefore a set of 50 values, 19 of which are 1 and 31 of which
are 0. A non-parametric bootstrap would sample from this dataset. Thus, the bootstrap replicate would
be equivalent to a Binomial(50, 19/50). The estimate of prevalence is then just the proportion of the
bootstrap samples that are 1, i.e. P = Binomial(50, 19/50)/50. This is exactly the same as the classical
statistics estimate given in Equation (9.6), and, interestingly, the parametric bootstrap (see next section)
has exactly the same estimate in this example too. The distribution being sampled in a parametric
bootstrap is a Binomial(1, P ) from which we have 50 samples and our MLE (maximum likelihood
estimator) for P is 19/50. Thus, the 50 parametric bootstrap replicates could be summed together as a
Binomial(50, 19/50), and our estimate for P is again Binomial(50, 19/50)/50.
We could have used a Bayesian inference approach. With a Uniform(0, 1) prior, and a binomial
likelihood function (which assumes the population is much larger than the sample), we would have an
estimate of prevalence using the beta distribution (see Section 8.2.3):

Figure 9.21 plots the Bayesian estimate alongside the bootstrap for comparison. They are very close,
except that the bootstrap estimate is discrete and the Bayesian is continuous, and, as the sample size
increases, they would become progressively closer. +

9.3.2 The parametric bootstrap
The non-parametric bootstrap in the previous section made no assumptions about the distributional
form of the population (parent) distribution. However, there will be many times that we will know to
which family of distributions the parent distribution belongs. For example, the number of earthquakes
each year and the number of Giardia cysts in litres of water drawn from a lake will logically both
be approximately Poisson distributed; the time between phone calls to an exchange will be roughly
exponentially distributed and the number of males in randomly sampled groups of a certain size will be

250

Risk Analvs~s

Figure 9.21

Bootstrap and Bayesian estimates of prevalence for Example 9.9.

binomially distributed. The parametric bootstrap gives us a means to use the extra information we have
about the population distribution. The procedure is as follows:
Collect the dataset of n samples {xl, . . . , x,}.
Determine the parameter(s) of the distribution that best fit(s) the data from the known distribution
family using maximum likelihood estimators (MLEs - see Section 10.3.1).
Generate B bootstrap samples {xr,. . . , x;} by randomly sampling from this fitted distribution.
For each bootstrap sample {x;,. . . , x:), calculate the required statistic 8. The distribution of these
B estimates of 8 represents the bootstrap estimate of uncertainty about the true value of 8.
We can illustrate the technique by using the pupil measurement data again. Let us assume that we know
for some reason (perhaps experience from other countries) that this measurement should be normally distributed for a population. The normal distribution has two parameters - its mean and standard deviation,
both of which we will assume to be unknown - and their MLEs are the mean and standard deviation
of the data to be fitted. The mean and standard deviation of the pupil measurements are 5.604 mm and
0.410mm respectively. Figure 9.22 shows a spreadsheet model where, in column C, 10 Normal(5.604,
0.410) distributions are randomly sampled to give the bootstrap sample. Cell D l 4 is calculating the mean
(the statistic of interest) of the bootstrap sample. Figure 9.23 shows the results of this parametric bootstrap model, together with the result from applying the classical statistics method of Equation 9.2 - they
are very similar. The result also looks very similar to the non-parametric distribution of Figure 9.20.
In comparison with the classical statistics model, which happens to be exact for this particular problem
(i.e. when the parent distribution is normal), both bootstrap methods provide a narrower range. In other
words, the bootstrap in its simplest form tends to underestimate the uncertainty associated with the
parameter of interest. A number of corrective measures are proposed in Efron and Tibshirani (1993).

Chapter 9 Data and statist~cs 25 1

A

\

B

1
2
-

3
4
5
6
7
8
9
10
11
12
13
14
15
16

7

7

Figure 9.22

Mean
Stdev

C

l

Data
5.92
5.06
6.16
5.60
4.87
5.61
5.72
5.36
6.03
5.71
5.60
0.4095

EI

D
Bootstrap
sample
5.57
5.72
5.25
6.01
4.91
6.06
5.57
5.54
4.68
4.69
5.40

F

I

IH

G

Formulae table
Data values
=AVERAGE(C4:C13)
=STDEV(C4:C13)
=VoseNormal($C$14,$C$15)

C4:C13

D4:D13

Example of a parametric bootstrap model.

4.7

4.9

5.1

5.3

5.5

5.7

5.9

6.1

6.3

True mean pupil diameter

Figure 9.23 Results of the parametric bootstrap model of Figure 9.22, together with the classical statistics
result.

Imagine that we wish to estimate the true depth of a well using some sort of sonic probe. The probe has
a known standard error a = 0.2m, i.e. a is the standard deviation of the normally distributed variation
of results the probe will produce when repetitively measuring the same depth. In order to estimate this
depth, we take n separate measurements. These measurements have a mean of 2 metres. The parametric
bootstrap model would take the average of n Normal(T, o) distributions to estimate the true mean p of
the distribution of possible measurement results, i.e. the true well depth. From the central limit theorem,

252

Risk Analysis

we know that this calculation is equivalent to
p = Normal (F.

5)

which is the classical statistics result of Equation (9.3). +
Parametric bootstrap estimate of the standard deviation of a normal distribution

It can also be shown that the parametric bootstrap estimates of the standard deviation of a normal
distribution when the mean is and is not known are exactly the same as the classical statistics estimates
given in Equations (9.5) and (9.6) (the reader may like to prove this, bearing in mind that the ChiSq(v)
distribution is the sum of the squares of v independent unit normal distributions).
Example 9.10 Parametric bootstrap estimate of mean time between calls at a telephone exchange

Imagine that we want to predict the number of phone calls there will be at an exchange during a
particular hour in the working day (say 2 p.m. to 3 p.m.). Imagine that we have collected data from this
period on n separate, randomly selected days. It is reasonable to assume that telephone calls will arrive
at a Poisson rate since each call will be, roughly speaking, independent of every other. Thus, we could
use a Poisson distribution to model the number of calls in an hour. The maximum likelihood estimate
(MLE) of the mean number of calls per hour at this time of day is simply the average number of calls
observed in the test periods x (see Example 10.3 for proof). Thus, our bootstrap replicate is a set of n
independent Poisson(F) distributions. To generate our uncertainty about the true mean number of phone
calls per hour at this time of the day, we calculate the mean of the sum of the bootstrap replicate, i.e. the
average of n independent Poisson(x) distributions. The sum of n independent Poisson(x) distributions
is simply Poisson(nF), so the average of n Poisson(i7) distributions is Poisson(nF)/n, where (nx) is
simply the sum of the observations. So, in general, if one has observations from n periods, the Poisson
parametric bootstrap for the mean number of observations per period h is given by

where S is the sum of observations in the n periods.
The uncertainty distribution of h should be continuous, as h can take any positive real value. However,
the bootstrap will only generate discrete values for h, i.e. (0, l l n , 2/n, . . .). When n is large this is not
a problem since the allowable values are close together, but when S is small the approximation starts
to fall down. Figure 9.24 illustrates three Poisson parametric bootstrap estimates for h for S = 2, 10
and 20 combined with n = 5. For S = 2, the discreteness will in some circumstances be an inadequate
uncertainty model for A, and a different technique like Bayesian inference would be preferable. However,
for values of S around 20 or more, the allowable values are relatively close together. For large S, one
can also add back the continuous characteristic of the parameter by making a normal approximation to
the Poisson, i.e. since Poisson(a) % Normal(a, &) we get

or, replacing S l n with F, we get
h

%

Normal (F.

{)

Chapter 9 Data and statistics

253

Figure 9.24 Three Poisson parametric bootstrap estimates for A for S = 2, 10 and 20 from Example 9.10.

254

Risk Analysis

which also illustrates the familiar reduction in uncertainty as the square root of the number of data
points n. +

9.3.3 The Bayesian bootstrap
The Bayesian bootstrap is considered to be a robust Bayesian approach for estimating a parameter of a
distribution where one has a random sample x from that distribution. It proceeds in the usual bootstrap
way, determining a distribution of 8, the distribution density of which is then interpreted as the likelihood
function l(x 18). This is then used in the standard Bayesian inference formula (Equation (9.8)) along
with a prior distribution n ( 6 ) for 8 to determine the posterior distribution. In many cases, the bootstrap
distribution for 8 closely approximates a normal distribution, so, by calculating the mean and standard
deviation of the B bootstrap replicates 8, one can quickly define a likelihood function.

9.4 Maximum Entropy Principle
The maximum entropy formalism (sometimes known as MaxEnt) is a statistical method for determining
a distribution of maximum logical uncertainty about some parameter, consistent with a certain limited
amount of information. For a discrete variable, MaxEnt determines the distribution that maximises the
function H(x), where

and where pi is the confidence for each of the M possible values xi of the variable x. The function
H (x) takes the equation of a statistical mechanics property known as entropy, which gives the principle
its name. For a continuous variable, H(x) takes the form of an integral function:
max

H(x) = The appropriate uncertainty distribution is determined by the method of Lagrange multipliers, and, in
practice, the continuous variable equation for H(x) is replaced by its discrete counterpart. It is beyond
the scope of this book to look too deeply into the mathematics, but there are a number of results that
are of general interest. MaxEnt is often used to determine appropriate priors in a Bayesian analysis,
so the results listed in Table 9.3 give some reassurance to prior distributions we might wish to use
conservatively to represent our prior knowledge.
The reader is recommended Sivia (1996) for a very readable explanation of the principle of MaxEnt
and derivation of some of its results. Gzyl(1995) provides a far more advanced treatise on the subject, but
requires a much higher level of mathematical understanding. The normal distribution result is interesting
and provides some justification for the common use of the normal distribution when all we know is the
mean and variance (standard deviation), since it represents the most reasonably conservative estimate
of the parameter given that set of knowledge. The uniform distribution result is also very encouraging
when estimating a binomial probability, for example. The use of a Beta(s a , n - s b) to represent
the uncertainty about the binomial probability p when we have observed s successes in n trials assumes
a Beta(a, b) prior. A Beta(1, 1) is a Uniform(0, 1) distribution, and thus our most honest estimate of p
is given by Beta(s 1, n - s 1).

+

+

+

+

Chapter 9 Data and statistics

Table 9.3

255

Maximum entropy method.

State of knowledge
Discrete parameter, n possible values {xi)
Continuous parameter, minimum and
maximum
Continuous parameter, known mean p
and variance a*
Continuous parameter, known mean p
Discrete parameter, known mean p

MaxEnt distribution
DUniform({xi]), i.e. p(xi) = 1/n
Uniform(min,max), i.e. f(x) = l/(max - min)
Norma@, a)
Expon(p)
Poisson(p)

9.5 Which Technique Should You Use?
I have discussed a variety of methods for estimating your uncertainty about some model parameter. The
question now is which one is best? There are some situations where classical statistics has exact methods
for determining confidence intervals. In such cases, it is sensible to use those methods of course, and the
results are unlikely to be challenged. In situations where the assumptions behind traditional statistical
methods are being stretched rather too much for comfort, you will have to use your judgement as
to which technique to use. Bootstraps, particularly the parametric bootstrap, are powerful classical
statistics techniques and have the advantage of remaining purely objective. They are widely accepted
by statisticians and can also be used to determine uncertainty distributions for statistics like the median,
kurtosis or standard deviation for parent distributions where classical statistics have no method to offer.
However, it is a fairly new (in statistics terms) technique, so you may find people resisting making
decisions based on its results, and the results can be rather "grainy".
The Bayesian inference technique requires some knowledge of an appropriate likelihood function,
which may be difficult and will often require some subjectivity in assessing what is a sufficiently
accurate function to use. Bayesian inference also requires a prior, which can be contentious at times, but
has the potential to include knowledge that the other techniques cannot allow for. Traditional statisticians
will sometimes offer a technique to use on your data that implicitly assumes a random sample from a
normal distribution, though the parent distribution is clearly not normal. This usually involves some sort
of approximation or a translation of the data (e.g. by taking logs) to make the data better fit a normal
distribution. While I appreciate the reasons for doing this, I do find it difficult to know what errors one
is introducing by such data manipulation.
Pretty often in our consulting work there is no option but to use Gibbs sampling because it is the only
way to handle multivariate estimates that are good for risk analysis. The WinBUGS program may be
a little difficult to use but the models can be made very transparent. I suggest that, if the parameter to
your model is important, it may well be worth comparing two techniques (for example, non-parametric
bootstrap (or parametric, if possible) and Bayesian inference with an uninformed prior). It will certainly
give you greater confidence if there is reasonable agreement between any two methods you might choose.
What is meant by reasonable will depend on your model and the level of accuracy your decision-maker
needs from that model. If you find there appears to be some reasonable disagreement between two
methods that you test, you could try running your model twice, once with each estimate, and seeing
if the model outputs are significantly different. Finally, if the uncertainty distributions between two
methods are significantly different and you cannot choose between them, it makes sense to accept
that this is another source of uncertainty and simply combine the two distributions, using a discrete
distribution, in the same way I describe in Section 14.3.4 on combining differing expert opinions.

256

Risk Analysis

9.6 Adding uncertainty in Simple Linear Least-Squares
Regression Analysis
In least-squares regression, one is attempting to model the change in one variable y (the response or
dependent variable) as a function of one or more other variables {x} (the explanatory or independent
variables). The regression relationship between {x} and y minimises the sum of squared errors between
a fitted equation for y and the observations. The theory of least-squares regression assumes the random
variations about this line (resulting from effects not explained by the explanatory variables) to be
normally distributed with constant variance across all {x) values, which means the fitted line describes
the mean y value for a given set of {x}. For simplicity we will consider a single explanatory variable
x (i.e. simple regression analysis), and that the relationship between x and y is linear (which is linear
regression analysis), i.e. we will use a model of the variability in y as a result of changes in x with the
following equation:

where m and c are the gradient and y intercept of the straight-line relationship between x and y, and
a is the standard deviation of the additional variation observed in y that is not explained by the linear
equation in x. Figure 6.1 1 illustrates these concepts. In least-squares linear regression, we typically have
a set of n paired observations {xi,yi) for which we wish to fit this linear relationship.

9.6.1 Classical statistics
Classical statistics theory (see Section 6.3.9) provides us with the best-fitting values for m , c and a,
assuming the model's assumptions to be correct, which we will name &, 2 and 8 . It also gives us
exact distributions of uncertainty for the estimate 9 p = ( m x p c) at some value x p (see, for example,
McClave, Dietrich and Sincich, 1997) and a as follows:

+

where

t(n - 2) is a Student t-distribution with (n - 2) degrees of freedom, X2(n - 1) is a chi-square distribution with (n - 1) degrees of freedom and s is the standard deviation of the differences ei between the
observed value yi and its predictor ji = &xi 2, i.e.

+

Chapter 9 Data and statist~cs 257

I

I

Body welgM (kg)

Log,,,body ml9M (kg1

--

Figure 9.25 Simple least-squares regression uncertainty about

P for the dataset of Table 9.4.

+

The uncertainty distribution for a is independent of the uncertainty distribution for (mx c), since
the model assumes that the random variations about the regression line are constant, i.e. that they are
independent of the values of x and y. It turns out that these same results are given by Bayesian inference
with uninformed priors, i.e. n ( m , c, a ) cc l/a.
The uncertainty equation for ji = mxi c produces a relationship between x and y with uncertainty
that is pinched at the middle, as shown in the simple least-squares regression analysis of Figure 9.25
for the data in Table 9.4. This makes sense as, the further we move towards the extremes of the set
of observations, the more uncertain we should be about the relationship. This describes the relationship between the weight of a mammal in kilograms and the mean weight of the brain of a mammal
in grams at that body weight. Strictly speaking, the theory of regression analysis says that the relationship can only be considered to hold within the range of observed values for x . However, with
caution, one can reasonably extrapolate a little past the range of observed body weights, although,
the further one extends beyond the observed range, the more tenuous the validity of the analysis
becomes.
Including uncertainty in a regression analysis means that we now have a family of normal distributions
representing the possible value of y, given a specific value for x . The normal distribution reflects the
observed variability about the regression line. That there is a family of these distributions reflects our

+

258

Risk Analysis

Table 9.4 Experimental
measurements of the weight
of mammals' bodies and
brains.
Brain weight
(9)

Body weight
(kg)

uncertainty about the coefficients for the regression equation and therefore the parameters for the normal
distribution.
The bootstrap

The variables x and y will fit a simple least-squares regression model if the underlying relationship
between these two variables is one of two forms: type A where the {xi,yi) observations are drawn
from a bivariate normal distribution in x and y, or type B where, for any value x, the distribution of
possible response values in y are Normal(mx c, a(x)) distributed and, for the time being, a(x) = a,
i.e. the random variations about the line have the same standard deviation (known as homoscedasticity).
In order to use the bootstrap to determine the uncertainty about the regression coefficients, we must
first determine which of these two relationships is occurring. Essentially, this is equivalent to the design
of the experiment that produced the {xi,y i ] observations. The experiment design is of type A if we
are making random observations of x and y together, whereas the experiment design is of type B if
we are testing at different specific values of x to determine the response in y. So, for example, the
{body weight, brain weight} data from Table 9.4 are of type A if we have attempted to pick a fairly
random sample of mammals, whereas they would be of type B if we had picked an animal from
each of the 20 subspecies of a species of some particular mammal. If, for example, we were doing
an experiment to demonstrate Hooke's law by adding incremental weights to a hanging spring and
observing the resultant extension beyond the spring's original length, the {mass, extension) observations
would again be of type B, because we are specifically controlling the x values to observe the resultant
y values.
For type A data, the regression coefficients can be thought of as parameters of a bivariate normal
distribution. Thus, using the non-parametric bootstrap, we simply resample from the paired observations
{xi,yi} and, at each bootstrap replicate, calculate the regression coefficients. Figure 9.26 illustrates this
type of analysis set out in a spreadsheet model for the dataset of Table 9.4.
For type B data, the x values are fixed since they were predetermined rather than resulting from a
random sample from a distribution. Assuming the random variations about the regression line to be
homoscedastic and the straight-line relationship to be correct, the only random variable involved is

+

Chapter 9 Data and statist~cs 259

1
2
-

A1

B

C

D

E

Brain weight
(gm)
0.0436
0.4492
1.698
2.844
14.69
16.265
22.309
372.97
713.72
3270.15

Body weight
(kg)
0.685
29.05
175.92
50.856
155.74
294.52
193.49
1034.4
9958.02
35160.5

Log(Brain
weight)
-1.361
-0.348
0.230
0.454
1.167
1.211
1.348
2.572
2.854
3.515

Log(Body
weight)
-0.1 64
1.463
2.245
1.706
2.192
2.469
2.287
3.015
3.998
4.546

F

I

IH

G

7

3
4
5
6

7
8
9

10
11
12
13
14
-

15
16
17
18
19
20
21
22
23

7

Figure 9.26
A1

B4:C13
D4:E13
F4:F13
G4:G13
GI4
GI5
GI6

Formulae table
Data
=LOG(B4)
=VoseDuniform($D$4:$D$13)
=VLOOKUP(F4,$D$4:$E$13,2)
=SLOPE(G4:Gl3,F4:F13)
=INTERCEPT(G4:Gl3,F4:F13)
=STEYX(G4:G13,F4:F13)

Bootstrap
Log(Brain Log(Body
weight)
weight)
1.348
2.287
0.230
2.245
0.454
1.706
1.348
2.287
3.515
4.546
-0.348
1.463
1.211
2.469
-1.361
-0.1 64
-1.361
-0.1 64
0.454
1.706
m
0.91011188
c
1.3382687
Steyx
0.35032446

Example model for a data pairs resampling (type A) bootstrap regression analysis.
B

C

D

E

Brain weight
(gm)
0.0436
0.4492
1.698
2.844
14.69
16.265
22.309
372.97
713.72
3270.15

Body weight
(kg)
0.685
29.05
175.92
50.856
155.74
294.52
193.49
1034.4
9958.02
35160.5

Log(Brain
weight)
-1.3605
-0.348
0.230
0.454
1.167
1.211
1.348
2.572
2.854
3.515

Log(Body
weight)
-0.1 64
1.463
2.245
1.706
2.192
2.469
2.287
3.015
3.998
4.546

I FI

G

1
2

3
4

5

6
7
8
9
-

10

2
2
13
14

2
16
1
7
3

2
20

21
22
3

B4:C13
D4:E13
G4:G13
GI5
H4:H13
H I 7 (output)
H I 8 (output)
H I 9 (output)

Formulae table
Data
=LOG(B4)
=E4-TREND($E$4:$E$13,$D$4:$D$13,D4)
=STDEV(G4:G13)
=VoseNormal(E4-G4,$G$15)
=SLOPE(H4:H13,D4:D13)
=INTERCEPT(H4:H13,D4:D13)
=STEYX(H4:H13,D4:D13)

Residual
-0.425
0.354
0.652
-0.074
-0.186
0.054
-0.243
-0.540
0.207
0.201

1

;::::1

H

I

I

Bootstrap
Log(Body
weight)
0.260
1.109
1.593
1.781
2.378
2.415
2.530
3.555
3.791
4.345

L"
Steyx

0.838
1.400
0.24436273

24

Figure 9.27

Example model for a residuals resampling (type B) parametric bootstrap regression analysis.

260

Risk Analys~s

that producing the variations about the line, and so we seek to bootstrap the residuals. If we know the
residuals are normally distributed, we can use a parametric bootstrap model, as follows:
1. Determine S,, - the standard deviation of the residuals about the least-squares regression line for
the original dataset.
2. For each of the x values in the dataset, randomly sample from a Normal(j, S,,) where 9 = rizx F
and riz and F are the least-squares regression coefficients for the original dataset.
3. Determine the least-squares regression coefficients for this bootstrap sample.
4. Repeat for B iterations.

+

Figure 9.27 illustrates this procedure in a spreadsheet model for the {body weight, brain weight} data.
Although this procedure works quite well, it would be better to use the classical statistics approach
described above, which offers exact answers under these conditions. However, a slight modification to the
above approach allows one to use a non-parametric bootstrap, i.e. where we can remove the assumption
of normally distributed residuals which may often not be very accurate. For the non-parametric model,

Mass (kg)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.O
1.1
1.2
1.3
1.4
1.5
mean
0.8

Extension
(mm)
Residual ei
137.393
0.720
2.281
138.954
3.304
139.977
3.434
140.107
142.765
6.093
145.606
8.933
11.208
147.881
147.011
10.338
144.194
7.521
144.949
8.277
152.161
15.488
149.694
13.021
154.700
18.027
25.364
162.037
155.275
18.602
23.548
160.221

Leverage
hi
0.228
0.187
0.151
0.122
0.099
0.081
0.069
0.063
0.063
0,069
0.081
0.099
0.122
0.151
0.187
0.228
mean

ssx

Figure 9.28
analysis.

Bootstrap
Extension
127.4
132.7
136.4
132.8
151.4
134.7
149.8
142.9
145.2
154.2
160.2
150.3
153.2
147.6
156.1
167.5

rn
19.81
131.54

3.4

83:C18
820
822
D3:D18
E3:E18
F3:F18
G3:G18
G21 (output)
G22 (output)

Modified
residual r,
0.820
2.530
3.587
3.665
6.417
9.318
11.616
10.681
7.771
8.578
16.155
13.714
19.239
27.535
20.628
26.800
11.8

Formulae table
Data
=AVERAGE(83:818)
(=SUM((83:818-$6$20)"2)}
=C3-TREND($C$3:$C$18,$8$3:$6$18,0)
=1/16+(63-$8$20)"2/$6$22
=D3/SORT(1-E3)
=TREND($C$3:$C$18,$6$3:$8$18,83)+Duniform($FS3:$F$18)-$F$19
=SLOPE(G3:G18,83:818)
=INTERCEPT(G3:G18,83:B18)

Example model for a residuals resampling (type B) non-parametric bootstrap regression

Chapter 9 Data and statistics

26 1

we must first develop a non-parametric distribution of residuals by changing them to have constant
variance. We define the modijied residual ri as follows:

where the leverage hi is given by

The mean of the modified residuals F is calculated. Then a bootstrap sample rjc is drawn from the set
of ri values and used to determine the quantity (Fj + ry - 7 ) for each x j value which is used in step 2
of the algorithm above. Figure 9.28 provides a spreadsheet illustration of this type of model using data
from Table 9.5.
In certain problems, it is logical that the y-intercept value c be set to zero. In this situation, the
leverage values are different:

The modified residuals are thus also different and won't sum to zero, so it is essential to mean-correct
the residuals before they are used to simulate random errors.
Table 9.5 Experimental
measurements of the
variation in length of a
vertical spring as weight is
attached to its end.
Mass
(kg)

Extension
(mm)

262

Risk Analysis

Bootstrapping the data pairs is more robust than bootstrapping the residuals, as it is less sensitive
to any deviation from the regression assumptions, but won't be as accurate where the assumptions are
correct. However, as the dataset increases in size, the results from bootstrapping the pairs approach
those from bootstrapping the residual, and it is also easier to execute, of course. These techniques can
be extended to non-linear, non-constant variance and to multiple linear regressions, described in detail
in Efron and Tibshirani (1993) and Davison and Hinkley (1997).

Chapter 10

Fitting distributions to data
In this chapter I use the statistical methods I've described in Chapter 9 to fit probability distributions to
data. I also briefly describe how regression models are fitted to data. There are other types of probability
models we use in risk analysis: fitting time series and copulas are described elsewhere in this book.
This chapter is concerned with a problem frequently confronted by the risk analyst: that of determining
a distribution to represent some variable in a risk analysis model. There are essentially two sources of
information used to quantify the variables within a risk analysis model. The first is available data and
the second is expert opinion. Chapter 14 deals with the quantification of the parameters that describe the
variability purely from expert opinion. Here I am going to look at techniques to interpret observed data for
a variable in order to derive a distribution that realistically models its true variability and our uncertainty
about that true variability. Any interpretation of data by definition requires some subjective input, usually
in the form of assumptions about the variable. The key assumption here is that the observed data can
be thought of as a random sample from some probability distribution that we are attempting to identify.
The observed data may come from a variety of sources: scientific experiments, surveys, computer
databases, literature searches, even computer simulations. It is assumed here that the analyst has satisfied
himself that the observed data are both reliable and as representative as possible. Anomalies in the data
should be checked out first where possible and any unreliable data points discarded. Thought should also
be given to any possible biases that could be produced by the method of data collection, for example:
a high-street survey may have visited an unrepresentative number of large or affluent towns; the data
may have come from an organisation that would benefit from doctoring the data, etc.
I start by encouraging analysts to review the data they have available and the characteristics of
the variable that is to be modelled. Several techniques are then discussed that enable analysts to fit
the available data to an empirical (non-parametric) distribution. The key advantages of this intuitive
approach are the simplicity of use, the avoidance of assuming some distribution form and the omission
of inappropriate or confusing theoretical (parametric or model-based) distributions. Techniques are then
described for fitting theoretical distributions to observed data, including the use of maximum likelihood
estimators, optimising goodness-of-fit statistics and plots.
For both non-parametric and parametric distribution fitting, 1 have offered two approaches. The first
approach provides a first-order distribution, i.e. a best-fitting (best-guess) distribution that describes
the variability only. The second approach provides second-order distributions that describe both the
variability of the variable and the uncertainty we have about what that true distribution of variability
really is. Second-order distributions are more complete than their first-order counterparts and require
more effort: if there is a sufficiently large set of data such that the inclusion of uncertainty provides only
marginally more information, it is quite reasonable to approximate the distribution to one of variability
only. That said, it is often difficult to gauge the degree of uncertainty one has about a distribution
without having first formally determined its uncertainty. The reader is therefore encouraged at least to
go through the exercise of describing the uncertainty of a variability distribution to determine whether
the uncertainty needs to be included.

1 0. I Analysing the Properties of the Observed Data
Before attempting to fit a probability distribution to a set of observed data, it is worth first considering
the properties of the variable in question. The properties of the distribution or distributions chosen to
be fitted to the data should match those of the variable being modelled. Software like BestFit, EasyFit,
Stat::Fit and ExpertFit have made fitting distributions to data very easy and removed the need for any
in-depth statistical knowledge. These products can be very useful but, through their automation and
ease of use, inadvertently encourage the user to attempt fits to wholly inappropriate distributions. It is
therefore worth considering the following points before attempting a fit:
Is the variable to be modelled discrete or continuous? A discrete variable may only take certain
specific values, for example the number of bridges along a motorway, but a measurement such as
the volume of tarmac, for example, is continuous. A variable that is discrete in nature is usually, but
not always, best fitted to a discrete distribution. A very common exception is where the increment
between contiguous allowable values is insignificant compared with the range that the variable may
take. For example, consider a distribution of the number of people using the London Underground
on any particular day. Although there can only be a whole number of people using the Tube, it
is easier to model this number as a continuous variable since the number of users will number in
the millions and there is little importance and considerable practical difficulty in recognising the
discreteness of the number.
In certain circumstances, discrete distributions can be very closely approximated by continuous
distributions for large values of x. If a discrete variable has been modelled by a continuous distribution for convenience, its discrete nature can easily be put back into the risk analysis model by
using the ROUND(. . .) function in Excel.
The reverse of the above, however, never occurs, i.e. data from a continuous variable are always
fitted to a continuous distribution and never a discrete distribution.
Do I really need to$t a mathematical (parametric) distribution to my data? It is often practical to
use the data points directly to define an empirical distribution, without having to attempt a fit to any
theoretical probability distribution type. Section 10.2 describes these methods.
Does the theoretical range of the variable match that of thejitted distribution? The fitted distribution
should, within reason, cover the range over which the variable being modelled may theoretically
extend. If the fitted distribution extends beyond the variable's possible range, a risk analysis model
will produce impossible scenarios. If the distribution fails to extend over the entire possible range of
the variable, the risk analysis will not reflect the true uncertainty of the problem. For example, data
on the oil saturation of a hydrocarbon reserve should be fitted to a distribution that is bounded at
zero and 1, as values outside that range are nonsensical. It may turn out that a normal distribution,
for example, fits the data far better than any other shape, but, of course, it extends from -oo to +oo.
In order to ensure that the risk analysis only produces meaningful scenarios, the normal distribution
would be truncated in the risk analysis model at zero and 1.
Note that a correctly fitted distribution will usually cover a range that is greater than that displayed
by the observed data. This is quite acceptable because data are rarely observed at the theoretical
extremes for the variable in question.
Do you already know the value of the distribution parameters? This applies most often to discrete
variables. For example, a Hypergeometric(n, D , M) distribution describes the number of successes
we might have from n independent individuals without replacement from a population of size M

Chapter 10 Fitting d~str~butions
t o data

265

where a success means the individual comes from a subpopulation of size D. It seems unlikely that
we would not know how many samples were taken to have observed our dataset of successes. More
likely is that we already know n and D and are trying to estimate M, or we know n and M and are
trying to estimate D. Discrete distributions like the binomial, beta-binomial, negative binomial, beta
negative binomial, hypergeometric and inverse hypergeometric have either the number of samples
n or the number of required successes s as parameters and will generally be known.
Is this variable independent of other variables in the model? The variable may be correlated with, or
a function of, another variable within the model. It may also be related to another variable outside the
model but which, in turn, affects a third variable within the risk analysis model. Figure 10.1 illustrates
a couple of examples. In example (a), a high-street bank's revenue is modelled as a function of the
interest and mortgage rates, among other things. The mortgage rate is correlated to the interest rate
since the interest rate largely defines what the mortgage rate is to be. This relationship must be
included in the model to ensure that the simulation will only produce meaningful scenarios. There
are two approaches to modelling such dependency relationships:

J

1. Determine distributions for the mortgage and interest rates on the basis of historical data and then

correlate the sampling from these distributions during simulation.
Interest

correlation

Mortgage
rates

function

Subcontractor's
cost model

Choice of
roofing material

I

F

Person-hours to
construct roof
timbers
function

\

4

Jfunction

Subcontractor's quote to
supply labour for roof
construction

Figure 10.1

Examples of dependency between model variables: (a) direct and (b) indirect.

2. Determine the distribution of interest rate from historical data and a (stochastic) functional relationship with the mortgage rate.
Method 1 is tempting because of its simple execution, but method 2 offers greater opportunity to
reproduce any observed relationship between the two variables.
In example (b) of Figure 10.1, a construction subcontractor is calculating her bid price to supply
labour for a roofing job. The choice of roofing material has not yet been decided and this uncertainty
has implications for the person-hours that will be needed to construct the roofing timbers and to lay
the roof. There is therefore an indirect dependency between these two variables that could easily have
been missed, had she not looked outside the immediate components of her cost calculation. Missing
this correlation would have resulted in an underestimation of the spread of the subcontractor's cost
and potentially could have led her to quote a price that exposed her to significant loss. Correlation
and dependency relationships form a vital part of many risk analyses. Chapter 13 describes several
techniques to model correlation and dependencies between variables.
Does a theoretical distribution exist thatfits the mathematics of this variable? Many theoretical distributions have developed as a result of modelling specific types of problem. These distributions then
find a wider use in other problems that have the same mathematical structure. Examples include: the
times between telephone calls at a telephone exchange or fires in a railway system may be accurately
represented by an exponential distribution; the time until failure of an electronics component may be
represented by a Weibull distribution; how many treble 20s a darts player will score with a specific
number of darts may be represented by a binomial distribution; the number of cars going through a
road junction in any one hour may be represented by a Poisson distribution; and the heights of the
tallest and shortest children in UK school classes may be represented by Gumbel distributions. If a
distribution can be found with the same mathematical basis as the variable being modelled, it only
remains to find the appropriate parameters to define the distribution, as explained in Section 10.3.
Does a theoretical distribution exist that is well known to fit this type of variable? Many types of
variable have been observed closely to follow specific distribution types without any mathematical
rationale being available to explain such close matching. Examples abound with the normal distribution: the weight of babies and other measures that come from nature, which is how the normal
distribution got its name; measurement errors in engineering, variables that are the sum of other
variables (e.g. means of samples from a population), etc. However, there are many other examples
for distributions like the lognormal, Pareto and Rayleigh, some of which are noted in Appendix 111.
If a distribution is known to be a close fit to the type of variable being modelled, usually as a result of
published academic work, all that remains is to find the best-fitting distribution parameters, as explained
in Section 10.3.
Erron - systematic and non-systematic

The collected data will at times have measurement errors that add another level of uncertainty. In
most scientific data collection, the random error is well understood and can be quantified, usually by
simply repeating the same measurement and reviewing the distribution of results. Such random errors are
described as non-systematic. Systematic errors, on the other hand, mean that the values of a measurement
deviate from the true value in a systematic fashion, consistently either over- or underestimating the true
value. This type of error is often very difficult to identify and quantify. One will often attempt to estimate
the degree of suspected systematic measurement error by comparing with measurements using another
technique that is known (or believed) to have little or no systematic error.

Chapter 10 Fitting distributions to data

267

Systematic and non-systematic error can both be accounted for in determining a distribution of fit.
In determining a first-order distribution, one need only adjust the data by the systematic error (the nonsystematic error has, by definition, a mean shift of zero). In second-order distribution fitting, one can
model the data as being uncertain, with appropriate distributions representing both the non-systematic
error and the systematic error (including uncertainty about what these error parameters are).
Sample slze

Is the number of data points available sufficient to give a good idea of the true variability? Consider the
20 plots of Figure 10.2 which each show random samples of twenty values drawn from a NormaI(100,
10) distribution. These samples are all plotted as histograms with six evenly spaced bars, three either
side of 100. The variation in shapes is something of an eye-opener to a lot of people, who expect
to see plots that look at least reasonably like bell-shaped curves and symmetric about 100. After all,
one might think that 20 data points is a reasonable number from which to draw some inference. The
bottom-right panel in Figure 10.2 shows all 400 data values (i.e. 20 plots * 20 data values each), which
looks something like a normal distribution but nonetheless still has a significant degree of asymmetry.
It is an interesting and useful exercise when attempting to fit data to a distribution to see what sort of
patterns one would observe if the data did truly come from the distribution that is being fitted. So, for
example, if I had 30 data values that I was fitting to a Lognormal(l0, 2) distribution, I could plot a
variety of 30 Monte Carlo samples (not Latin hypercube samples, which forces a better-fitting sample
to the true distribution than a random sample would produce) from a Lognormal(l0, 2) distribution in
histogram form and see the different patterns they produce. I am at least then aware of the range of data
patterns that I should accept as feasibly coming from that distribution for that size of sample.
Overdispenion o f data

Sometimes we wish to fit a parametric distribution to observations, but note that the data appear to
show a much larger spread than the fitted distribution would suggest. For example, in fitting a binomial
distribution to the results of a multiple question exam taken by a large class, one might imagine that the
distribution of a number of correct answers could be modelled by a Binomial(n, p ) distribution, where
n = the number of questions and p is the average probability for the class of correctly answering a
question. The spread of the fitted binomial distribution is essentially determined by the mean = np, since
n is fixed, so there is no opportunity to attempt to match the fitted distribution to the data in terms of
the observed spread in results as well as the average result. One plausible reason for the fit being poor is
that there will be a range of abilities in the class. If one models the range of probabilities of successfully
answering a question across all the individuals as a beta distribution, the resultant distribution of results
will be drawn from a beta-binomial distribution, which is then the appropriate distribution to fit to
the data. The extra variability added to the binomial distribution by making p beta distributed means
that the beta-binomial distribution will always have more spread than the binomial. The beta-binomial
distribution has three parameters: a , j3 and n, where a and j3 (sometimes written as a1 and a2) are the
parameters of the beta distribution and n remains the number of trials. These three parameters allow
a better and logical match to the mean and variance of the observations. As a and B become larger,
the beta distribution becomes narrower, i.e. the participants have a narrow range of probabilities of
successfully answering a question (the population is more homogeneous), and the Beta-Binomial(n, a ,
j3) is then approximated well by a Binomial(n, al(a B)).
The same type of problem applies in fitting the Poisson(h) distribution to data. Since the mean and
variance are both equal to h, the spread of the distribution is determined by the mean. Observed data
are often more widely dispersed than a Poisson distribution might suggest, and this is often because the

+

268

R~skAnalysis

Figure 10.2 Examples of distributions of 20 random samples from a NormaI(100, 10) distribution.

Chapter 10 Fitt~ngdistr~butionst o data

269

n observations come from Poisson processes with different means h l , . . . , A,. For example, one might
be looking at the failure rates of computers. Each computer will be slightly different from the next and
so will have its own A. If one models the distribution of variability of the hs using a Gamma(a, B )
distribution, the resultant distribution of failures in a single time period is a P6lya(a, B). The P6lya
distribution always has a variance greater than the mean, and its two parameters allow a greater flexibility
in matching the distribution to the mean and variance of the observations.
Finally, data fitted to a normal distribution can often demonstrate longer tails than a normal distribution. In such cases, the three-parameter Student t-distribution can be used, i.e. Student(v) * a p ,
where p is the fitted distribution's mean, a is the fitted distribution's standard deviation and v is the
"degrees of freedom" parameter that determines the shape of the distribution. For v = 1, this is the
Cauchy distribution which has infinite (i.e. undeterminable) mean and standard deviation. As v gets
larger, the tails shrink until at very large v (some 50 or more) this looks like a Normal(p, a ) distribution. The three-parameter Student t-distribution can be derived as the mixture of normal distributions
with the same mean and different variances distributed as a scaled inverse X 2 . SO, in attempting to fit
data to the three-parameter Student t-distribution instead of a normal distribution, you would need to be
able reasonably to convince yourself that the observations were drawn from normal distributions with
the same mean and different variances.

+

10.2 Fitting a Non-Parametric Distribution to the Observed Data
This section discusses techniques for fitting an empirical distribution to data. We look at continuous
and then discrete variables, and both first-order (variability only) and second-order (variability and
uncertainty) fitting.

10.2. I Modelling a continuous variable (first order)
If the observed variable is continuous and reasonably extensive, it is often sufficient to use a cumulative
frequency plot of the data points themselves to define its probability distribution. Figure 10.3 illustrates
an example with 18 data points. The observed F(x) values are calculated as the expected F(x) that
would correspond to a random sampling from the distribution, i.e. F(x) = i/(n l), where i is the
rank of the observed data point and n is the number of data points. An explanation for this formula
is provided in the next section. Determination of the empirical cumulative distribution proceeds as
follows:

+

The minimum and maximum for the empirical distribution are subjectively determined on the basis
of the analyst's knowledge of the variable. For a continuous variable, these values will generally be
outside the observed range of the data. The minimum and maximum values selected here are 0 and
45.
The data points are ranked in ascending order between the minimum and maximum values.
The cumulative probability F(xi) for each xi value is calculated as follows:

This formula maximises the chance of replicating the true distribution.

270

Risk Analysis

Cumulative Frequency of Data
1.00.9

a

0

5

10

15

20
25
Data values

30

35

40

45

Number of data points n = 18

Figure 10.3 Fitting a continuous empirical distribution to data using a cumulative distribution.

The two arrays, {xi] and {F(xi)), along with the minimum and maximum values, can then be used
as direct inputs into a cumulative distribution CumulA(min, max, {xi], {F(xi)}).
The VoseOgive function in ModelRisk will simulate values from a distribution constructed using the
method above.
If there is a very large amount of data, it becomes impracticable to use all of the data points to define
the cumulative distribution. In such cases it is useful to batch the data first. The number of batches
should be set to the practical maximum that balances fineness of detail (large number of bars) with the
practicalities of having large arrays defining the distribution (lower number of bars).
Example 10.1 Fitting a continuous non-parametric distribution t o data

Figure 10.4 illustrates an example where 221 data points are plotted in histogram form over the range
of the observed data. The analyst considers that the variable could conceivably range from 0 to 300.
Since there are no observed data with values below 20 and above 280, the histogram bar ranges need

Chapter 10 Fitting dlstribut~onst o data

27 1

Relative and Cumulative Frequency of Data

I

x-value

Number of data points n = 221

Histogram bar
From A To 6
20
40
40
60
60
80
80
100
100
120
120
140
140
160
160
180
180
200
200
220
220
240
240
260
260
280

Figure 10.4

Observed frequencies
Cumulative
Histogram
probability f(Acx 5 B ) probability F ( x < B )
0.018
0.018
0.113
0.131
0.204
0.335
0.534
0.199
0.145
0.679
0.796
0.118
0.050
0.846
0.045
0.891
0.045
0.937
0.023
0.959
0.027
0.986
0.009
0.995
0.005
1.000

Modelled distribution
Cumulative
Histogram bar
From A
To B probability F ( x 5 B)
0
40
0.018
40
60
0.131
60
80
0.335
80
100
0.534
100
120
0.679
120
140
0.796
140
160
0.846
160
180
0.89 1
180
200
0.937
200
220
0.959
220
240
0.986
240
260
0.995
1.OOO
260
300

Fitting an empirical distribution to histogrammed data using a cumulative distribution.

to be altered to accommodate the subjective minimum and maximum. The easiest way to achieve this
is to extend the range of the first and last bars with non-zero probability to cover the required range,
but without altering its probability. In this example, the histogram bar with range 20-40 is expanded to
a range 0-40, and the bar with range 260-280 is expanded to range 260-300. We will probably have
slightly exaggerated the tails of the distribution. However, if the number of bars initially selected is quite
large, there will be little real effect on the model. The {xi}array input into the cumulative distribution
is then {40,60, . . . ,240,2601, the {Pi}array is (0.018,O. 131, . . . ,0.986,0.995} and the minimum and
maximum are, of course, 0 and 300 respectively. +

272

Risk Analysis

Converting a histogram distribution into a cumulative distribution may seem a little pointless when
the histogram can be used in a risk analysis model. However, this technique allows analysts to select
varying bar widths to suit their needs, as in the above example, and therefore to maximise detail in the
distribution where it is needed.

10.2.2 Modelling a continuous variable (second order)'
When we do not have a great deal of data, a considerable amount of uncertainty will remain about an
empirical distribution determined directly from the data. It would be very useful to have the flexibility
of using an empirical distribution, i.e. not having to assume a parametric distribution, and also to
be able to quantify the uncertainty about that distribution. The following technique provides these
requirements.
Consider a set of n data values {xj) drawn from a distribution and ranked in ascending order {xi}so
xi < xi+l. Data thus ranked are known as the order statistics of {xj]. Individually, each of the values
of {xj}may map as a U(0, 1) onto the cumulative probability of the parent distribution F(x). We take
a U(0, 1) distribution as the prior distribution for the cumulative probability for any value of x. We
can thus use a U(0, 1) prior for Pi = F(xi) for the value of the ith observation. However, we have the
additional information that, of n values drawn randomly from this distribution, xi ranked ith, i.e. (i - 1)
of the data values are less than xi, and (n - i) values are greater than xi. Using Bayes' theorem and the
binomial theorem, the posterior marginal distribution for Pi can readily be determined, remembering
that Pi has a U(0, 1) prior and therefore a prior probability density = 1:

which is simply the standard beta distribution Beta(i, n - i

+ 1):

Equation (10.2) could actually be determined directly from the fact that the beta distribution is
the conjugate to the binomial likelihood function and that a U(0, 1) = Beta(1, 1). The mean of the
Beta(i, n - i 1) distribution equals il(n 1): a formula that has been used in Equation (10.1) to
estimate the best-fitting first-order non-parametric cumulative distribution.
Since Pi+1 > Pi, these beta distributions are not independent, so we need to determine the conditional
distribution f (Pi+l 1 Pi), as follows. The joint distribution f (Pi, Pj) for any two Pi, Pj is calculated
using the binomial theorem in a similar manner to the numerator of the equation for f (Pi Ixi; i = 1, n),
that is

+

+

where Pj > Pi, and remembering that the prior probability densities for Pi and Pj equal 1 since they
have U(0, 1) priors.
Thus, for j = i 1,

+

'

1 submitted a paper on this technique (I developed the idea) for publication in a journal a long time ago. One reviewer was
horribly dismissive, saying that the derivation was one of the most drunken s h e had ever seen, and anyway it was a Bayesian method
(it isn't) so it was of no value. Actually, this has proven to be one of the most useful things I ever figured out.

Chapter 10 Fitting d~stribut~ons
to data

273

The conditional probability f (Pi+l1 Pi) is thus given by

where k is some constant. The corresponding cumulative distribution function F(Pi+1 I Pi) is then given
by

and thus k = n - i and the formula reduces to

Together, Equations (10.2) and (10.3) provide us with the tools to construct a non-parametric secondorder distribution for a continuous variable given a dataset sampled from that distribution. The distribution for the cumulative probability PI that maps onto the first-order statistic XI can be obtained from
Equation (10.2) by setting i = 1:

The distribution for the cumulative probability Pz that maps onto the first-order statistic X2 can be
obtained from Equation (10.3). Being a cumulative distribution function, F(Pi+1(Pi) is Uniform(0, 1)
distributed. Thus, writing Uitl to represent a Uniform(0, 1) distribution in place of F(Pi+1( Pi), using
the identity 1 - U(0, 1) = U(0, I), and rewriting for Pi+l, we obtain

which gives

etc.
Note that each of the U2, U3, . . . , U, uniform distributions are independent of each other.
The formulae from Equations (10.4) and (10.5) can be used as inputs into a cumulative distribution
function available from standard Monte Carlo software tools like @RISK and Crystal Ball, together with
subjective estimates of the minimum and maximum values that the variable may take. The variability
("inner loop7') is described by the range for the variable in question, and estimates of the cumulative
distribution shape via the ( X i and
} {Pi] values. The uncertainty ("outer loop") is catered for by the
uncertainty distributions for the minimum, maximum and Pi values.

274

Risk Analysis

The RiskCumul distribution function in @RISK, the VoseCumulA function in ModelRisk and the
cumulative version of the custom distribution in Crystal Ball have the same cumulative distribution
function, namely

where Xo = minimum, X,+, = maximum, Po = 0, Pa+, = 1 and Xi 5 x < Xi+, .
Figure 10.5 illustrates a model where a dataset is being used to create a second-order distribution using
this technique. If the model is created in the current version of @RISK, the uncertainty distributions for
F(x) in column D are nominated as outputs, a smallish number of iterations are run and the resultant
data are exported back to a spreadsheet. Those data are then used to perform multiple simulations (the
"outer loop") of uncertainty using @RISK'S RiskSimtable function: the "inner loop" of variability comes
A]
1
2

3
4
5
6
--

7

02
103
104
105

Figure 10.5

B

I

C

I

I E ~F

D

Rank (i) Order statistics (x)
F(x)
minimum
0
0
1
0.473
2
3.170
3
4.254
0.0237
4
4.540
99
0.9453
95.937
0.9726
96.936
100
1
maximum
100

I

G

IH

=VoseBeta(1,100)

Model to produce a second-order non-parametric continuous distribution.

Iteration# / Cell

100
Simtable functions
Order statistics

Rows 3 to 102:

5.04%
O.2g0/o
0.05%
0.93%
5.04%
0.473

6.83%
1.63%
0.69%
4.28%
6.83%
3.170

99.08%
99.20%
98.99%
99.45%
99.08%
95.937

99.60%
99.90%
99.67%
99.88%
99.60%
96.936

Formulaetable
List samples from the distribut~onfor F(X)
=RiskSimtable(CS:C102)
L~sts
the observed data values
=VoseCurnulA(O,100,C104:CX104,C103:CX103)

Figure 10.6 BRISK model to run a second-order risk analysis using the data generated from the model of
Figure 10.5.

Chapter 10 F~tt~ng
d~str~but~ons
t o data

AI
1
2
3
4
5
6
7
101
102
103
104
105
106
107,

-

-

B

[

c

Rank (9 Order statistics (x)
0
m~nlrnurn
1
0 473
3.170
2
4.254
3
4
4 540
98
93.301
95.937
99
100
96 936
100
maxrmum

l~econdorder distribution: 1

I

[EI

D

-

I

(H

G

Crystal Ball Pro formulae table
B4.6103
1.100
Input est~matesof min, max
C3, C104
C4.Cl03
Input data values
D4
=CB Beta(1,100,1)
D5'D103
=1-(CB.Un1form(O,l)~(1/(1OO-B3)))~(1
-D3)
Dl06 (output) =CB Custom(C3 D104)
04:D103 are nominated as uncertalnfy d~str~butions
D 106 IS nomlnated as a varrab111tydtstrtbuhon

F(x)
0
0 0324
0.0383
0 0438
00511
0 9414
0.9766
0 9901
1

41.640

F

275

1
-

---

-

Figure 10.7 Crystal Ball Pro model to run a second-order risk analysis using the data generated from the
model of Figure 10.5.

from the cumulative distribution itself, as shown in Figure 10.6. If one creates the model in Crystal Ball
Pro, the F ( x ) distributions can be nominated as uncertainty distributions and the cumulative nominated
as the variability distribution, and the innerlouter loop procedure will run automatically (Figure 10.7).
There are a few limitations to this technique. In using a cumulative distribution function, one is
assuming a histogram style probability density function. When there are a large number of data points,
this approximation becomes irrelevant. However, for small datasets the approximation tends to accentuate
the tails of the distribution: a result of the histogram "squaring-off' effect of using the cumulative
distribution. In other words, the variability will be slightly exaggerated. However, the squaring effect
can be reduced, if required, by using some sort of smoothing algorithm and defining points between
each observed value. In addition, for small datasets, the tails' contribution to the variability will often
be more influenced by the subjective estimates of the minimum and maximum values: a fact one can
view positively (one is recognising the real uncertainty about a distribution's tail), and negatively (the
smaller the dataset, the more the technique relies on subjective assessment).
The fewer the data points, the wider the confidence intervals will become, quite naturally, and, in
general, the more emphasis will be placed on the subjectively defined minimum and maximum values.
Conversely, the more data points available, the less influence the minimum and maximum estimates will
have on the estimated distribution. In any case, the values of the minimum and maximum only have
influence on the width (and therefore height) of the end two histogram bars in the fitted distribution. The
fact that the technique is non-parametric, i.e. that no statistical distribution with a particular cumulative
distribution function is assumed to be underlying the data, allows the analyst a far greater degree of
flexibility and objectivity than that afforded by fitting parametric distributions.
A further sophistication to this technique would be to correlate the uncertainty distributions for the
minimum and maximum parameter values to the uncertainty distributions for P I and P,, respectively.
If PI were to be sampled with a high value, it would make sense that the variability distribution had a
long left tail and the value sampled for the minimum should be towards its lowest value. Similarly, a
high value for P,, would suggest a low value for the maximum. One could model these relationships
using either very high levels of negative rank order correlation for simplicity or some more involved
but more explicit equation.

276

Risk Analysis

Example 10.2 Fitting a second-order non-parametric distribution t o continuous data

Three datasets of five, then a further 15 and then another 80 random samples were drawn from a
Normal(100, 10) distribution to give sets of five, 20 and 100 samples. The graphs of Figure 10.8 show,
naturally enough, that the population distribution is approached with increasing confidence the more
data values one has available.
There are classical statistical techniques for determining confidence distributions for the mean and
standard deviation of a normal distribution that is fitted to a dataset with a population normal distribution,
as discussed in Section 9.1, namely:

Standard deviation a =

(a) 5 data point second-order distribution

Jx2;i

1)

(b) 20 data point second-order distribution

x-value

I

-(c) 100 data point second-order distribution

x-value

population distribution

x-value

Figure 10.8 Results of fitting a non-parametric distribution to data from a normal parent distribution: (a) five
data points; (b) 20 data points; (c) 100 data points; (d) the true population distribution.

Chapter 10 Fitting dtstributions t o data

277

where:
p and o are the mean and standard deviation of the population distribution;

x and s are the mean and sample standard deviation of the n data points being fitted;
t(n - 1) is a t-distribution with n - 1 degrees of freedom, and X2(n- 1) is a chi-square distribution
with n - 1 degrees of freedom.
The second-order distribution that would be fitted to the 100 data point set using the non-parametric
technique is shown in the right-hand panel of Figure 10.9. The second-order distribution produced using
the above statistical theory with the assumption of a normal distribution is shown in the left-hand panel
of Figure 10.9. There is a strong agreement between these two techniques. The statistical technique
produces less uncertainty in the tails because the assumption of normality adds extra information that
the non-parametric technique does not provide. This is, of course, fine providing we know that the
population distribution is truly normal, but leads to overconfidence in the tails if the assumption is
incorrect. +
The advantage of the technique offered here is that it works for all continuous smooth distributions,
not just the normal distribution. It can also be used to determine distributions of uncertainty for specific
percentiles and quantiles of the population distribution, essentially by reading off values from the fitted
cumulative distribution and interpolating as necessary between the defined points. Figure 10.10 shows
a spreadsheet model for determining the percentile, defined in cell E3, of the population distribution,
given the 100 data points from the normal distribution used previously. The uncertainty distribution for
the percentile is produced by running a simulation with cell G3 as the output. Similarly, Figure 10.11
illustrates a spreadsheet to determine the cumulative probability that the value in cell F2 represents in
the population distribution. The distribution of uncertainty of this cumulative probability is produced
by running a simulation with cell H2 as the output. In other words, the model in Figure 10.10 is
slicing horizontally through the second-order fitted distribution at F(x) = 50 %, while the model of
Figure 10.11 is slicing vertically at x = 99. The spreadsheets can, of course, be expanded or contracted
to suit the number of data points available. ModelRisk includes the VoseOgive2 function that generates
the array of F(x) variables required for second-order distribution modelling.
(b) Non-parametric distribution

(a) Statistical theory distribution

-2

100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
70

80

90

100
110
x-value

120

130

I

x-value

Figure 10.9 Comparison of second-order distributions using the non-parametric technique and classical
statistics.

278

Risk Analysis

Formulae table
B3:B104
C4:C103
C3, C104
B107:B208
C107, C208

0:101
Input,datavalues
Input: estimates of rnln, max
=B3
0,l
=VoseBeta(l ,100)

=VLOOKUP(E3,Cl07:D208,2)
=VLOOKUP(G103,B107:C208,2)
=VLOOKUP(G104,B107:C208,2)
=VLOOKUP(G103,B3:C104,2)
=VLOOKUP(Gl04,B3:C104,2)

Figure 10.10 Model to determine the uncertainty distribution for a percentile.

10.2.3 Modelling a discrete variable (first order)
Data from a discrete variable can be used to define an empirical distribution in two ways: if the number
of allowable x values is not very large, the frequency of data at each x value can be used directly to
define a discrete distribution; and if the number of allowable x values is very large, it is usually easier
to arrange the data into histogram form and then define a cumulative distribution, as above. The discrete
nature of the variable can be reintroduced by imbedding the cumulative distribution inside the standard
spreadsheet ROUND(. . .) function.

10.2.4 Modelling a discrete variable (second order)
Uncertainty can be added to the discrete probabilities in the previous technique to provide a secondorder Discrete distribution. Assuming that the variable in question is stable (i.e. is not varying with
time), there is a constant (i.e. binomial) probability pi that any observation will have a particular value
xi. If k of the n observations have taken the value xi, then our estimate of the probability pi is given
by Beta(k 1, n - k 1) from Section 8.2.3. However, all these pi probabilities have to sum to equal

+

+

Chapter 10 Fltting distributions t o data

279

Formulae table
Input: data values
Input: estimates of rnin, max
0:lOl
=B3
0:101
0,l
=VoseBeta(l ,100)
C109:C207 =l-(VoseUniform(0,1)A(1/(100-B108)))*(1-C108)
=VLOOKUP(F2,B3:C104,2)

B4:6103
63, 6104
C3:C104
D3:D104
B107:B208
C107, C208

225

226
-

=VLOOKUP(G103,C3:D104,2)
=VLOOKUP(G104,C3:D104,2)
=VLOOKUP(G103,B107:C208,2)
=VLOOKUP(G104,B107:C208,2)
F2
Input target value
H2 (output) =IF(F2~B3,na(),lF(F2>B104,na(),(F2-G105)/(G106-G105)*(G108-G107)+G107))

227

Figure 10.11 Model to determine the uncertainty distribution for a quantile.

1.0, so we normalise the pi values. Figure 10.12 illustrates a spreadsheet that calculates the discrete
second-order non-parametric distribution for the set of data in Table 10.1 where the distribution has
been assumed to finish with the maximum observed value.
There remains a difficulty in selecting the range of this distribution, and it will be a matter of judgement how far one extends the range beyond the observed values. In the simple form described here there
is also a problem in determining the pi values for these unobserved tails, and any middle range that has
no observed values, since all pi values will have the same (normalised) Beta(1, n 1) distribution, no
matter how extreme their position in the distribution's tail. This obviously makes no sense, and, if it is
important to recognise the possibility of a long tail beyond observed data, a modification is necessary.
The tail can be forced to zero by multiplying the beta distributions by some function that attenuates
the tail, although the choice of function and severity of the attenuation will ultimately be a subjective
matter.
These last two techniques have the advantages that the distribution derived from the observed data
would be unaffected by any subjectivity in selecting a distribution type and that the maximum use of the
data has been made in defining the distribution. There is an obvious disadvantage in that theprocess is

+

280

Risk Analys~s

AI

1
2
-

3
4
5
6

7
8
9
--

10
11
12

2
2

15
16

1
7
3

2

B

I

C

D

E

1 FI

Estimate of Normalised
Value Frequency probability probability
0
0
0.20%
1
1
0.40%
2
1
0.40%
3
12
2.59%
4
7.17%
35
5
52
10.56%
10.19%
6
61
12.35%
11.92%
7
65
13.15%
12.69%
8
69
13.46%
13.94%
9
68
13.27%
13.75%
10
46
9.36%
9.04%
11
33
6.77%
6.54%
12
26
5.38%
5.19%
12
13
2.59%
2.50%
14
2.19%
10
2.12%
2
15
0.60%
0.58%
5
16
1.20%
1.15%

G

I

H

II

Formulae table
=SUM(C3:C22)
=VoseBeta(C3+1,$C$23-C3+1)

24

Figure 10.12 Model to determine a discrete non-parametric second-order distribution.

Table 10.1 Dataset to fit a discrete
second-order non-parametric distribution.
Value

Frequency

Value

Frequency

fairly laborious for large datasets. However, the FREQUENCY() function and Histogram facility in Excel
and the BestFit statistics report and other statistics packages can make sorting the data and calculating the
cumulative frequencies very easy. More importantly, there remains a difficulty in estimating probabilities
for values of the variable that have not been observed. If this is important, it may well be better to fit
the data to a parametric distribution.

Chapter 10 F~tt~ng
d~str~but~ons
t o data

I

5

28 1

1 0.3 Fitting a First-Order Parametric Distribution to Observed
Data
This section describes methods of finding a theoretical (parametric) distribution that best fits the observed
data. The following section deals with fitting a second-order parametric distribution, i.e. a distribution
where the uncertainty about the parameters needs to be recognised. A parametric distribution type may
be selected as the most appropriate to fit the data for three reasons:
The distribution's mathematics corresponds to a model that accurately represents the behaviour of
the variable being considered (see Section 10.1).
The distribution to be fitted to the data is well known to fit this type of variable closely (see
Section 10.1 again).
The analyst simply wants to find the theoretical distribution that best fits the data, whatever it may be.
The third option is very tempting, especially when distribution-fitting software is available that can
automatically attempt fits to a large number of distribution types at the click of an icon. However, this
option should be used with caution. Analysts must ensure that the fitted distribution covers the same
range over which, in theory, the variable being modelled may extend; for example, a four-parameter
beta distribution fitted to data will not extend past the range of the observed data if its minimum and
maximum are determined by the minimum and maximum of the observed data. Analysts should ensure
that the discrete or continuous nature of the distribution matches that of the variable. They should also
be flexible about using a different distribution type in a later model, should more data become available,
although this may cause confusion when comparing old and new versions of the same model. Finally,
they may find it difficult to persuade the decision-maker of the validity of the model: seeing an unusual
distribution in a model with no intuitive logic associated with its parameters can easily invoke distrust
of the model itself. Analysts should consider including in their report a plot of the distribution being
used against the observed data to reassure the decision-maker of its appropriateness.
The distribution parameters that make a distribution type best fit the available data can be determined
in several ways. The most common and most flexible technique is to determine parameter values known
as maximum likelihood estimators (MLEs), described in Section 10.3.1. The MLEs of the distribution
are the parameters that maximise the joint probability density or probability mass for the observed data.
MLEs are very useful because, for several common distributions, they provide a quick way to arrive
at the best-fitting parameters. For example, the normal distribution is defined by its mean and standard
deviation, and its MLEs are the mean and standard deviation of the observed data. More often than not,
however, when we fit a distribution to data using maximum likelihood, we need to use an optimiser
(like Microsoft Solver which comes with Microsoft Excel) to find the combination of parameter values
that maximises the likelihood function. Other methods of fit tend to find parameter values that minimise
some measure of goodness of fit, some of which are described in Section 10.3.4. Both using MLEs
and minimising goodness-of-fit statistics enable us to determine first-order distributions. However, for
fitting second-order distributions we need additional techniques for quantifying the uncertainty about
parameter values, like the bootstrap, Bayesian inference and some classical statistics.

10.3.1 Maximum likelihood estimators
The maximum likelihood estimators (MLEs) of a distribution type are the values of its parameters that
produce the maximum joint probability density for the observed dataset x . In the case of a discrete

282

Risk Analysts

distribution, MLEs maximise the actual probability of that distribution type being able to generate the
observed data.
Consider a probability distribution type defined by a single parameter a . The likelihood function
L(a) that a set of n data points (xi) could be generated from the distribution with probability density
f (x) - or, in the case of a discrete distribution, probability mass - is given by

The MLE B is then that value of a that maximises L(a). It is determined by taking the partial derivative
of L(a) with respect to a and setting it to zero:

For some distribution types this is a relatively simple algebraic problem, for others the differential
equation is extremely complicated and is solved numerically instead. This is the equivalent of using
Bayesian inference with a uniform prior and then finding the peak of the posterior uncertainty distribution
for a. Distribution fitting software have made this process very easy to perform automatically.
Example 10.3 Determining the MLE for the Poisson distribution

The Poisson distribution has one parameter, the product kt, or just h if we let t be a constant. Its
probability mass function f (x) is given by

Because of the memoryless character of the Poisson process, if we have observed x events in a total
time t , the likelihood function is given by

Let I (A) = In L(h), and using the fact that t is a constant:
I (h) = -kt

+ xln(h) + xln(t) - In(x!)

The maximum value of I (A), and therefore of L(h), occurs when the partial derivative with respect
to h equals zero, i.e.

Rearranging yields

i.e. it is the average number of observations per unit time.

+

t:

Chapter 10 Fitting d~stnbut~ons
t o data

283

10.3.2 Finding the best-fitting parameters using optimisation
Figure 10.13 illustrates a Microsoft Excel spreadsheet set up to find the parameters of a gamma distribution that will best match the observed data. Excel provides the GAMMADIST function that will
return the probability density of a gamma distribution.
The Microsoft Solver in Excel is set to find the maximum value for cell F5 (or equivalently F7) by
changing the values of cr and B in cells F2 and F3.

10.3.3 Fitting distributions to truncated, censored or binned data
Maximum likelihood methods offer the greatest flexibility for distribution fitting because we need only
be able to write a probability model that corresponds with how our data are observed and then maximise
that probability by varying the parameters.
Censored data are those observations that we do not know precisely, only that they fall above or
below a certain value. For example, a weight scale will have a maximum value X it can record: we
might have some measurement off the scale and all we can say is that they are greater than X.
Truncated data are those observations that we do not see above or below some level. For example,
at a bank it may not be required to record an error below $100, and a sieve system may not select out
diamonds from a river below a certain diameter.
Binned data are those observations that we only know the value of in terms of bins or categories. For
example, one might record in a survey that customers were (0, 101, (10, 201, (20-401 and (40+) years
of age.

Formulae table

C3 C?42
F5
F7

=LOCIO(CAMMADIST(B3,alpha,beta,O))

=SUM(C3 C242)
=VoseGammaProbl0(B3 8242,alpha,beta,O)

Figure 10.13 Using Solver to perform a maximum likelihood fit of a gamma distribution to data.

284

Risk Analysis

It is a simple matter to produce a probability model for each category or combination, as shown in
the following examples where we are fitting to a continuous variable with density f ( x ) and cumulative
probability F ( x ) :
Example 10.4 Censored data

Observations. Measurement censored at Min and Max. Observations between Min and Max are
a , b , c , d and e ; p observations below Min and q observations above Max.
Likelihood function: f ( a ) * f ( b ) * f ( c ) * f ( d ) * f ( e ) * F(Min)P * (1 - F (Max))g.
Explanation. For p values we only know that they are below some value Min, and the probability of
being below Min is F(Min). We know q values are above Max, each with probability (1 - F(Max)).
The other values we have the exact measurements for. +
Example 10.5 Truncated data

Observations. Measurement truncated at Min and Max. Observations between Min and Max are
a , b , c , d and e .
Likelihoodfunction: f ( a ) * f (b)* f ( c ) * f ( d ) * f ( e ) / ( F ( M a x )- lik in))^.
Explanation. We only observe a value if it lies between Min and Max which has the probability
( F (Max) - F (Min)). +
Example 10.6 Binned data

Observations. Measurement binned into continuous categories as follows:
Bin

Frequency

Likelihood function: ~ ( 1 0 ) "* ( F ( 2 0 ) - F ( 1 0 ) ) *~ (~F (50) - F ( 2 0 ) ) *~ (1
~ - F (50))~.
Explanation. We observe values in bins between a Low and High value with probability F(High) F(Low). +

10.3.4 Goodness-of-fit statistics
Many goodness-of-fit statistics have been developed, but two are in most common use. These are the
chi-square (X2)and Kolmogorov-Smirnoff (K-S) statistics, generally used for discrete and continuous
distributions respectively. The Anderson-Darling (A-D) statistic is a sophistication of the K-S statistic.
The lower the value of these statistics, the closer the theoretical distribution appears to fit the data.

6

i

Chapter 10 Fittinn distributions t o data

285

Goodness-of-fit statistics are not intuitively easy to understand or interpret. They do not provide a true
measure of the probability that the data actually come from the fitted distribution. Instead, they provide
a probability that random data generated from the fitted distribution would have produced a goodnessof-fit statistic value as low as that calculated for the observed data. By far the most intuitive measure of
goodness of fit is a visual comparison of probability distributions, as described in Section 10.3.5. The
reader is encouraged to produce these plots to assure himself or herself of the validity of the fit before
labouring over goodness-of-fit statistics.
Critical values and confidence intervals for goodness-of-fit statistics

Analysis of the x2, K-S and A-D statistics can provide confidence intervals proportional to the probability that the fitted distribution could have produced the observed data. It is important to note that this
is not equivalent to the probability that the data did, in fact, come from the fitted distribution, since
there may be many distributions that have similar shapes and that could have been quite capable of
generating the observed data. This is particularly so for data that are approximately normally distributed,
since many distributions tend to a normal shape under certain conditions.
Critical values are determined by the required confidence level a. They are the values of the goodnessof-fit statistic that have a probability of being exceeded that is equal to the specified confidence level.
Critical values for the X 2 test are found directly from the X 2 distribution. The shape and range of
the X 2 distribution are defined by the degrees of freedom v , where v = N - a - 1, N = number of
histogram bars or classes and a = number of parameters that are estimated to determine the best-fitting
distribution.
Figure 10.14 shows a descending cumulative plot for the x2(11) distribution, i.e. a X 2 distribution
with 11 degrees of freedom. This plots an 80 % chance (*I (the confidence interval) that a value would
have occurred that was higher than 6.988 (the critical value at an 80 % confidence level) for data that
were actually drawn from the fitted distribution, i.e. there is only a 20 % chance that the x 2 value could
be this small. If analysts are conservative and accept this 80 % chance of falsely rejecting the fit, their
confidence interval a equals 80 % and the corresponding critical value is 6.988, and they will not accept
any distribution as a good fit if its x2 is greater than 6.988.
Critical values for K-S and A-D statistics have been found by Monte Carlo simulation (Stephens,
1974, 1977; Chandra, Singpurwalla and Stephens, 1981). Tables of critical values for the K-S statistic
Chi-Squared (11)

286

Risk Analysis

are very commonly found in statistical textbooks. Unfortunately, the standard K-S and A-D values are
of limited use for comparing critical values if there are fewer than about 30 data points. The problem
arises because these statistics are designed to test whether a distribution with known parameters could
have produced the observed data. If the parameters of the fitted distribution have been estimated from the
data, the K-S and A-D statistics will produce conservative test results, i.e. there is a smaller chance of
a well-fitting distribution being accepted. The size of this effect varies between the types of distribution
being fitted. One technique for getting round this problem is to use the first two-fifths or so of the data
to estimate the parameters of a distribution, using MLEs for example, and then to use the remaining
data to check the goodness of fit.
Modifications to the K-S and A-D statistics have been determined to correct for this problem, as
shown in Tables 10.2 and 10.3 (see the BestFit manual published in 1993), where n is the number of
data points and D, and A: are the unmodified K-S and A-D statistics respectively.
Another goodness-of-fit statistic with intuitive appeal, similar to the A-D and K-S statistics, is the
Cramer-von Mises statistic Y:

The statistic essentially sums the square of differences between the cumulative percentile Fo(Xi) for
the fitted distribution for each Xi observation and the average of i l n and (i - l)/n: the low and high
plots of the empirical cumulative distribution of Xi values. Tables for this statistic can be found in
Anderson and Darling (1952).
Table 10.2

Kolmogorov-Smirnoff statistic.

Distribution
Normal
Exponential
Weibull and extreme value

Modified test statistic

I

f i - 0.01 + 085)Dn
fi
on=%
n )

,/ED,

All others

Table 10.3 Anderson-Darling statistics.
Distribution

Modified test statistic

Normal
Exponential
Weibull and extreme value
All others

(I

+

$)A2

A;

Chapter 10 Fltting distributions t o data

287

The chi-square goodness-of-fit statistic

The chi-square (x2) statistic measures how well the expected frequency of the fitted distribution compares with the observed frequency of a histogram of the observed data. The chi-square test makes the
following assumptions:
1. The observed data consist of a random sample of n independent data points.
2. The measurement scale can be nominal (i.e. non-numeric) or numerical.
3. The n data points can be arranged into histogram form with N non-overlapping classes or bars that
cover the entire possible range of the variable.
The chi-square statistic is calculated as follows:

where O(i) is the observed frequency of the ith histogram class or bar and E(i) is the expected
frequency from the fitted distribution of x values falling within the x range of the ith histogram bar.
E(i) is calculated as
E (i) = ( F(i,,)

-

F (imin))* n

(10.7)

where F(x) is the distribution function of the fitted distribution, (i,,,) is the x value upper bound of
the ith histogram bar and (i,,,) is the x value lower bound of the ith histogram bar.
Since the X 2 statistic sums the squares of all of the errors ( 0 ( i ) - E(i)), it can be disproportionately
sensitive to any large errors, e.g. if the error of one bar is 3 times that of another bar, it will contribute
9 times more to the statistic (assuming the same E(i) for both).
X 2 is the most commonly used of the goodness-of-fit statistics described here. However, it is very
dependent on the number of bars, N, that are used. By changing the value of N, one can quite easily
switch ranking between two distribution types. Unfortunately, there are no hard and fast rules for
selecting the value of N. A good guide, however, is Scott's normal approximation which generally
appears to work very well:

where n is the number of data points. Another useful guide is to ensure that no bar has an expected
frequency smaller than about 1, i.e. E(i) > 1 for all i. Note that the X 2 statistic does not require that
all or any histogram bars are of the same width.
The X 2 statistic is most useful for fitting distributions to discrete data and is the only statistic described
here that can be used for nominal (i.e. non-numeric) data.
Example 10.7 Use of X 2 with continuous data

A dataset of 165 points is thought to come from a normal(70,20) distribution. The data are first put into
histogram form with 14 bars, as suggested by Scott's normal approximation (Table 10.4(a)). The four
extreme bars have expected frequencies below 1 for a Normal(70, 20) distribution with 165 observations.
These outside bars are therefore combined to produce a revised set of bar ranges. Table 10.4(b) shows
the X 2 calculation with the revised bar ranges.

Table 10.4 Calculation of the
revised bar ranges.

(a) Histogram bars
From A To B
-00

10
20
30
40
50
60
70
80
90
100
110
120
130

10
20
30
40
50
60
70
80
90
100
110
120
130

+cc

X2

statistic for a continuous dataset: (a) determining the bar ranges to be used; (b) calculation of

Expected frequency
of Normal(70, 20)
0.22
0.80
2.73
7.27
15.15
24.73
31.59
3 1.59
24.73
15.15
7.27
2.73
0.80
0.22

(b)

Revised bars
From A
To B
-00

20
30
40
50
60
70
80
90
100
110
120

20
30
40
50
60
70
80
90
100
110
120
+W

x2

with

E(i) of
Normal(70, 20)

o(i)

Chi-square calc.
{O(i) - ~ ( i ) } ' / ~ ( i )

1.02
2.73
7.27
15.15
24.73
31.59
3 1.59
24.73
15.15
7.27
2.73
1.02

3
5
6
10
21
25
37
21
17
11
6
3
Chi-square:

3.80854
1.88948
0.22168
1.75344
0.56275
1.37523
0.92601
0.56275
0.22463
1.91447
3.92002
3.80854
20.96754

Table 10.5 Calculation of the x 2 statistic for a discrete dataset: (a) tabulation of the data; (b) calculation of

(a) x value

0
1
2
3
4
5
6
7
8
9
10
11+
Total:

Observed Frequency E(i)
frequency
of Poisson
o(i)
(4.456)
0
1.579
8
7.036
18
15.675
20
23.282
29
25.936
21
23.113
18
17.165
10
10.926
8
6.086
2
3.013
1
1.343
1
0.846
136

(b)

x-value

0
1
2
3
4
5
6
7
8
9
1O+

(3~erved
frequency
o(i)
0
8
18
20
29
21
18
10
8
2
2

Frequency
E(i) of Poisson
(4.456)
1.579
7.036
15.675
23.282
25.936
23.113
17.165
10.926
6.086
3.013
2.189
Chi Squared:

x2.
Chi Squared calc.
{O(i) - ~ ( i ) ] ~ / ~ ( i )
1.5790
0.1322
0.3448
0.4627
0.3621
0.1932
0.0406
0.0786
0.6020
0.3406
0.0163
4.1521

290

Risk Analys~s

Hypotheses
a
a

Ho: the data come from a Normal(70, 20) distribution.
H I : the data do not come from the Normal(70, 20) distribution.

Decision

The X 2 test statistic has a value of 21.0 from Table 10.4(b). There are v = N - 1 = 12 - 1 = 11 degrees
of freedom (a = 0 since no distribution parameters were determined from the data). Looking this up
in a x2(1 1) distribution, the probability that we will have such a high value of X 2 when Ho is true is
around 3 %. We therefore conclude that the data did not come from a Normal(70, 20) distribution. +
Example 10.8 Use of X 2 with discrete data

A set of 136 data points is believed to come from a Poisson distribution. The MLE for the parameter h
for the Poisson is estimated by taking the mean of the data points: h = 4.4559. The data are tabulated
in frequency form in Table 10.5(a) and, next to it, the expected frequency from a Poisson(4.4559)
distribution, i.e. E(i) = f (x) * 136, where

The expected frequency for a value of 11+, calculated as 136 - (the sum of all the other expected
frequencies), is less than 1. The number of bars is therefore decreased, as shown in Table 10.5(b), to
ensure that all expected frequencies are greater than 1.
Hypotheses
a
a

Ho: the data come from a Poisson distribution.
HI: the data do not come from a Poisson distribution.

Decision

The X 2 test statistic has a value of 4.152 from Table 10.5(b). There are v = N - a - 1 = 11 - 1 - 1 = 9
degrees of freedom (a = 1 since one distribution parameter, the mean, was determined from the data).
Looking this up in a x2(9) distribution, the probability that we will have such a high value of X 2 when
Ho is true is just over 90 %. Since this is such a large probability, we cannot reasonably reject Ho and
therefore conclude that the data fit a Poisson (4.4559) distribution. +
I've covered the chi-square statistic quite a bit here, because it is used often, but let's just trace back
a moment to the assumptions behind it. The x2(v) distribution is the sum of v unit normal distributions
squared. Equation (10.6) says

so the test is assuming that each { o c i ).(;)- E ( i ) 1 2 is approximately a Normal(0, I)?, i.e. that O(i) is approximately Normal(E(i),
distributed. O(i) is a Binomial(n, p) variable, where p = F(i,,)
F(imi,) and will only look somewhat normal when n is large and p is not near 0 or 1, in which

m)

Chapter 10 Fitt~ngdistribut~onst o data

29 1

case it will be approximately Normal(np, dm).
The point is that the chi-square test is based
on an implicit assumption that there are a lot of observations for each bin - so don't rely on it. Maximum likelihood methods will give better fits than optimising the chi-square statistic and have more
flexibility, and the ability of the chi-square statistic as a measure of comparisons between goodness of
fits is highly questionable since one should change the bin widths for each fitted distribution to give the
same probability of a random sample lying within, but those bin ranges will be different for each fitted
distribution.
Kolmogorov-Smimoff (K-S) statistic
The K-S statistic D, is defined as

where D, is known as the K-S distance, n is the total number of data points, F(x) is the distribution
function of the fitted distribution, F,(x) = i / n and i is the cumulative rank of the data point.
The K-S statistic is thus only concerned with the maximum vertical distance between the cumulative
distribution function of the fitted distribution and the cumulative distribution of the data. Figure 10.15
illustrates the concept for data fitted to a Uniform(0, 1) distribution.
The data are ranked in ascending order.
The upper FU(i) and lower FL(i) cumulative percentiles are calculated as follows:

where i is the rank of the data point and n is the total number of data points.
F(x) is calculated for the Uniform distribution (in this case F(x) = x).

a

t

-

Uniform(0, 1) distribution
Observed distribution

7

I

Upper bound of F(x,), = ~ / n

e,-

I
0

0.2

-iower bound of F(x,): = (i- I)/"

0.4

0.6
x-value

0.8

1

1.2

Figure 10.15 Calculation of the Kolmogorov-Smirnoff distance D, for data fitted to a Uniform(0, 1)
distribution.

292

Risk Analysis

The maximum distance Di between F(i) and F(x) is calculated for each i:

where ABS (. . .) finds the absolute value.
The maximum value of the D idistances is then the K-S distance D,:

The K-S statistic is generally more useful than the X 2 statistic in that the data are assessed at all
data points which avoids the problem of determining the number of bands into which to split the data.
However, its value is only determined by the one largest discrepancy and takes no account of the lack of
fit across the rest of the distribution. Thus, in Figure 10.16 it would give a worse fit to the distribution
in (a) which has one large discrepancy than to the distribution in (b) which has a poor general fit over
the whole x range.
The vertical distance between the observed distribution F,(x) and the theoretical fitted distribution
F(x) at any point, say xo, itself has a distribution with a mean of zero and a standard deviation OK-s
given by binomial theory:

The size of the standard deviation OK-s over the x range is shown in Figure 10.17 for a number of
distribution types with n = 100. The position of D, along the x axis is more likely to occur where
OK-s is greatest, which, Figure 10.17 shows, will generally be away from the low-probability tails. This
insensitivity of the K-S statistic to lack of fit at the extremes of the distributions is corrected for in the
Anderson-Darling statistic.
The enlightened statistical literature is quite scathing about distribution-fitting software that use the
KS statistic as a goodness of fit - particularly if one has estimated the parameters of a fitted distribution
from data (as opposed to comparing data against a predefined distribution). This was not the intention
of the K-S statistic, which assumes that the fitted distribution is fully specified. In order to use it as a
goodness-of-fit measure that ranks levels of distribution fit, one must perform simulation experiments
to determine the critical region of the K-S statistic in each case.
Anderson-Darling (A-D) statistic

The A-D statistic A: is defined as

where

n is the total number of data points, F(x) is the distribution function of the fitted distribution, f (.x)
is the density function of the fitted distribution, F,(x) = i l n and i is the cumulative rank of the data
point.
The Anderson-Darling statistic is a more sophisticated version of the Kolmogorov-Smirnoff statistic.
It is more powerful for the following reasons:

Chapter 10 Fitting distributions to data

293

Determination of K-S distance D,,

1-

- - - - ,

0.7

Observed distribution F,,(x)!
F(x) for fitted distribution

4

15%

20%

25%
x-values

304

35%

40%

(a) Distribution is generally a good fit except in one particular area
Determination of K-S distance D,
Observed distribution Fn(x)
- - - - F(x)
for fitted distribution
--o-

I

x-values

I

(b) Distribution is generally a poor fit but with no single large discrepancies

Figure 10.16 How the K-S distance D, can give a false measure of fit because of its reliance on the single
largest distance between the two cumulative distributions rather than looking at the distances over the whole
possible range.

i

@(x) compensates for the increased variance of the vertical distances between distributions
which is described in Figure 10.17.
f (x) weights the observed distances by the probability that a value will be generated at that x value.
The vertical distances are integrated over all values of x to make maximum use of the observed
data (the K-S statistic only looks at the maximum vertical distance).

;;,y

Standard deviation of D,, for
Pareto(1, 2) distribution

Standard deviation of D, for
NorrnaI(100, 10) distribution

0.01
0

0

50

100 150

,
0.01
0
0

200

10

5

15

Standard deviation of D, for
Triangular(0, 5, 20) distribution

Standard deviation of D, for
Uniform(0, 10) distribution

E
n
,
0.01
0
0
Standard deviation of D, for
Exponential(25) distribution

5

10

15

20

Standard deviation of D, for
Rayleigh(3) distribution

Figure 10.17 Variation in the standard deviation of the K-S statistic D, over the range of a variety of
distributions. The greater the standard deviation, the more chance that D, will fall in that part of the
range, which shows that the K-S statistic will tend to focus on the degree of fit at x values away from a
distribution's tails.

The A-D statistic is therefore a generally more useful measure of fit than the K-S statistic, especially
where it is important to place equal emphasis on fitting a distribution at the tails as well as the main
body. Nonetheless, it still suffers from the same problem as the K-S statistic in that the fitted distribution
should in theory be fully specified, not estimated from the data. It suffers from a larger problem in that
the confidence region has been determined for only a very few distributions.
A better goodness-of-fit measure

For reasons I have explained above, the chi-square, Kolmogorov-Smirnoff and Anderson-Darling
goodness-of-fit statistics are technically all inappropriate as a method of comparing fits of distributions
to data. They are also limited to having precise observations and cannot incorporate censored, truncated
or binned data. Realistically, most of the time we are fitting a continuous distribution to a set of precise

Chapter 10 Fitting distributions t o data

295

observations, and then the Anderson-Darling does a reasonable job. However, for important work you
should instead consider using statistical measures of fit called information criteria.
Let n be the number of observations (e.g. data values, frequencies), k be the number of parameters
be the
to be estimated (e.g. the normal distribution has two parameters: mu and sigma) and log L,,
maximized value of the log-likelihood for the estimated.
1. SIC (Schwarz information criterion, aka Bayesian information criterion, BIC)
SIC = ln[n]k - 2 ln[L,,]
2. AICc (Akaike information criterion)

3. HQIC (Hannan-Quinn information criterion)
HQIC = 2 ln[ln[n]]k - 2 ln[L,,,]
The aim is to find the model with the lowest value of the selected information criterion. The
-21n[L,,]
term appearing in each formula is an estimate of the deviance of the model fit. The
coefficients for k in the first part of each formula shows the degree to which the number of model
parameters is being penalised. For n 2 20 or so the SIC (Schwarz, 1997) is the strictest in penalising
loss of degree of freedom by having more parameters in the fitted model. For n 2 40 the AICc (Akaike,
1974, 1976) is the least strict of the three and the HQIC (Hannan and Quinn, 1979) holds the middle
ground, or is the least penalising for n 5 20.
ModelRisk applies modified versions of these three criteria as a means of ranking each fitted model,
whether it be fitting a distribution, a time series model or a copula. If you fit a number of models to
your data, try not to pick automatically the fitted distribution with the best statistical result, particularly
if the top two or three are close. Also, look at the range and shape of the fitted distribution and see
whether they correspond to what you think is appropriate.

1 0.3.5 Goodness-of-fit plots
Goodness-of-fit plots offer the analyst a visual comparison between the data and fitted distributions.
They provide an overall picture of the errors in a way that a goodness-of-fit statistic cannot and allow
the analyst to select the best-fitting distribution in a more qualitative and intuitive way. Several types
of plot are in common use. Their individual merits are discussed below.
Comparison of probability density

Overlaying a histogram plot of the data with a density function of the fitted distribution is usually the
most informative comparison (see Figure 10.18(a)). It is easy to see where the main discrepancies are
and whether the general shape of the data and fitted distribution compare well. The same scale and
number of histogram bars should be used for all plots if a direct comparison of several distribution fits
is to be made for the same data.

296

0.03

R~skAnalysis

,

P-P Comparison Between lnput Distribution and

Comparison of lnput Distributionand
Normal(99.18.16.52)

Normal(99.18,16.52)

1.o
0.8
Normal 0.6
0.4
0.2
0.0

50

70

90

110

130

0.0

150

0.2

0.4

0.6

0.8

d) Probability-probability plot

(a) Comparison of probability density
Comparison of lnput Distributionand
Norma1(99.18,16.52)

P-P Comparison for Discrete data
1.0 -

--

Poisson 0.6
0.4

.-

0.2

-.

r

I

0.8 .-

0.0

I

0.0
I

0.2

0.4

0.6

0.8

1.0

I
(e) Probability-probability plot for a discrete distribution

(b)Comparison of cumulative probability distributions

1

1.O

DifferenceBetween lnput Distribution and
Normal(99.18,16.52)

Q-Q Comparison Between lnput Distributionand
Norma1(99.18,16.52)
150

0.03

130
Normal 110
90
70

50

-0.03
50

70

90

110

130

150

(c) Plot of difference between probability densities

50

70

90

110

130

150

(f) Quantils-quantile plot

Figure 10.18 Examples of goodness-of-fit plots.

Comparison of probability distributions

An overlay of the cumulative frequency plots of the data and the fitted distribution is sometimes used
(see Figure 10.18(b)). However, this plot has a very insensitive scale and the cumulative frequency of
most distribution types follow very similar S-curves. This type of plot will therefore only show up very
large differences between the data and fitted distributions and is not generally recommended as a visual
measure of the goodness of fit.
Difference between probability densities

This plot is derived from the above comparison of probability density, plotting the difference between
the probability densities (see Figure 10.18(c)). It has a far more sensitive scale than the other plots

Chapter 10 Fitting distributions t o data

297

described here. The size of the deviations is also a function of the number of classes (bars) used to plot
the histogram. In order to make a direct comparison between other distribution function fits using this
type of plot, analysts must ensure that the same number of histogram classes is used for all plots. They
must also ensure that the same vertical scale is used, as this can vary widely between fits.
Probability-probability (P-P) plots

This is a plot of the cumulative distribution of the fitted curve F(x) against the cumulative frequency
F,(x) = i/(n
1) for all values of xi (see Figure 10.18(d)). The better the fit, the closer this plot
resembles a straight line. It can be useful if one is interested in closely matching cumulative percentiles,
and it will show significant differences between the middles of the two distributions. However, the plot is
far less sensitive to any discrepancies in fit than the comparison of probability density plot and is therefore
not often used. It can also be rather confusing when used to review discrete data (see Figure 10.18(e))
where a fairly good fit can easily be masked, especially if there are only a few allowable x values.

+

Quantile-Quantile (Q-Q) plots

+

This is a plot of the observed data xi against the x values where F(x) = F,(x), i.e. = i/(n 1)
(see Figure 10.18(f)). As with P-P plots, the better the fit, the closer this plot resembles a straight
line. It can be useful if one is interested in closely matching cumulative percentiles, and it will show
significant differences between the tails of the two distributions. However, the plot suffers from the
same insensitivity problem as the P-P plots.

10.4 Fitting a Second-Order Parametric Distribution to Observed
Data
The techniques for quantifying uncertainty, described in the first part of this chapter, can be used to
determine the distribution of uncertainty for parameters of a parametric distribution fitted to data. The
three main techniques are classical statistics methods, the bootstrap and Bayesian inference by Gibbs
sampling. The main issue in estimating the parameters of a distribution from data is that the uncertainty
distributions of the estimated parameters are usually linked together in some way.
Classical statistics tends to overcome this problem by assuming that the parameter uncertainty distributions are normally distributed, in which case it determines a covariance between these distributions.
However, in most situations one comes across, the parameter uncertainty distributions are not normal
(they will tend to be as the amount of data gets very large), so the approach is very limited.
The parametric bootstrap is much better, since one simply resamples from the MLE fitted distribution
in the same fashion in which the data appear, and in the same amount, of course. Then, refitting using
MLE again gives us random samples from the joint uncertainty distribution for the parameters. The
main limitation to the bootstrap is in fitting a discrete distribution, particularly one where there are few
allowable values, as this will make the joint uncertainty distribution very "grainy".
Markov chain Monte Carlo will also generate random samples from the joint uncertainty density. It
is very flexible but has the small problem of setting the prior distributions.
Example 10.9 Fitting a second-order normal distribution to data with classical statistics

The normal distribution is easy to fit to data since we have the z-test and chi-square test giving us
precise formulae. There are not many other distributions that can be handled so conveniently. Classical
statistics tells us that the uncertainty distributions for the mean and standard deviation of the normal

298

R~skAnalysis

distribution are given by Equation (9.3)

when we don't know the mean, and by Equation (9.1)

when we know the standard deviation.
So, if we simulate possible values for the standard deviation first with Equation (9.3), we can feed
these values into Equation (9.1) to determine the mean. +
Example 10. I 0 Fitting a second-order normal distribution t o data using the parametric bootstrap

The sample mean (Excel: AVERAGE) and sample standard deviation (Excel: STDEV) are the MLE
estimates for the normal distribution. Thus, if we have n data values with mean Y and standard deviation
s, we generate n independent Normal(T, s) distributions and recalculate their mean and standard deviation
using AVERAGE and STDEV to generate uncertainty values for the population parameters. +
Example 10. I I Fitting a second-order gamma distribution t o data using the parametric bootstrap

There are no equations for direct determination of the MLE parameter values for a gamma distribution,
so one needs to construct the likelihood function and optimise it by varying the parameters, which is
rather tiresome but by far the more common situation encountered. ModelRisk offers distribution-fitting
algorithms that do this automatically. For example, the two-cell array {VoseGammaFitP(data, TRUE)}
will generate values from the joint uncertainty distribution for a gamma distribution fit to the set of
values data. The array {VoseGammaFitP(data, FALSE)) will return just the MLE values. The function
VoseGammaFit(data, TRUE) returns random samples from a gamma distribution. with the parameter
uncertainty imbedded, and VoseGammaFit(data, 0.99, TRUE) will return random samples from the
uncertainty distribution for the 99th percentile of a gamma distribution fit to data. +
Example 10.12 Fitting a second-order gamma distribution t o data using WinBUGS

The following WinBUGS model takes 47 data values (that were in fact drawn from a Gamma(4,7)
distribution) and fits a gamma distribution. There are two important things to note here: in WinBUGS
the scale parameter lambda is defined as the reciprocal of the beta scale parameter more commonly
used (and this book's convention); and I have used a prior for each parameter of Garnma(1, 1000) [in
more standard convention] which is an exponential with mean 1000. The exponential distribution is used
because it extends from zero to infinity which matches the parameters' domains, and an exponential
with such a large mean will appear quite flat over the range of interest (so it is reasonably uninformed).
The model is:
model
(

-

for(i in 1 : M ) {
x [ i]
dgamma (alpha, lambda)

I

Chapter 10 Fitt~ngdistributions t o data

299

--

alpha
dgamma(l.O, 1.OE-3)
beta
dgamrna(l.0, 1.0E-3)
lambda<-l/beta

1

After a burn-in of 100000 iterations, the estimates are as shown in Figure 10.19.
The estimates are centred roughly around 4 (mean = 4.1 11) and 7 (mean = 6.288), as we might have
hoped having generated the samples from a Gamma(4,7). We can check to see whether the choice of prior
has much effect. For alpha the uncertainty distribution ranges from about 2 to 6: the Exponential(1000)

l
E
lE
beta sample: 10000

alpha sample: 10000

1

Figure 10.19 WinBUGS estimates of gamma distribution parameters for Example 10.12.
13
12 11 10 -

.
. .. ..... ..

9m

Z

P

87-

65431
2

3

5

4

6

7

alpha

Figure 10.20 5000 posterior distribution samples from the WinBUGS model to estimate gamma distribution
parameters for Example 10.12.

300

Risk Analysis

Figure 10.21 Plot showing the empirical cumulative distribution of the data in bold and the second-order
fitted lognormal distribution in grey.

density at 2 and 6 respectively are 9.98E-4 and 9.94E-4, a ratio of 1.004, so essentially flat over the posterior region. Between 4 and 13, the range for the beta parameter, the ratio is 1.009 - again essentially flat.
Figure 10.20 shows why it is necessary to estimate the joint uncertainty distribution. The banana
shape of this scatter plot shows that there is a strong correlation between the parameter estimates. You
can understand why this relationship occurs intuitively as follows: the mean of a population distribution
can be estimated quite quickly from the data and will have roughly normally distributed uncertainty:
in this case the 47 observations have sample mean = 25.794 and sample variance = 184.06, so the
population mean uncertainty is Normal(25.794, SQRT(184.06147)) = Normal(25.794, 1.979). The mean
of a Gamma(a, B) distribution is aB. Equating the two says that if a! = 6 then B must be about 4.3 f 0.3,
and if a! = 3 then B is about 8.6 f 0.6, which can be seen in Figure 10.20. +

1 0.4.1 Second-order goodness-of-fit plots
Second-order goodness-of-fit plots are the same as the first-order plots in Figure 10.18, except that
uncertainty about the distribution is expressed as a series of lines describing possible true distributions
(sometimes called a candyfloss or spaghetti plot). Figure 10.21 gives an example.
In Figure 10.21 the grey lines represent the fitted lognormal cumulative distribution function for 15
samples from the joint uncertainty distribution for the lognormal's mean and standard deviation. This
gives an intuitive visual description of how certain we are about the fitted distribution. ModelRisk's
distribution-fitting facility will show these plots automatically with a user-defined number of "spaghetti"
lines.

Chapter

Sums of random variables
One of the most common mistakes people make in producing even the most simple Monte Carlo
simulation model is in calculating sums of random variables.
In this chapter we look at a number of techniques that have extremely broad use in risk analysis in
estimating the sum of random variables. We start with the basic problem and how this can be simulated.
Then we examine how simulation can be improved, and then how it can often be replaced with a direct
construction of the distribution of the sum of the random variables. Finally, I introduce the ability to
model correlation between variables that are being summed.

I I. I The Basic Problem
We are very often in the situation of wanting to estimate the aggregate (sum) of a number n of variables,
each of which follow the same distribution or have the same value X (see Table 11.l, for example).
We have six situations to deal with (Table 11.2).
Situations A, B, D and E

For situations A B, D and E the mathematics is very easy to simulate:
SUM = n * X
Situation C

For situation C, where X are independent random variables (i.e. each X being summed can take a
different value) and n is fixed, we often have a simple way to determine the aggregate distribution
based on known identities. The most common identities are listed in Table 11.3.
We also know from the central limit theorem that, if n is large enough, the sum will often look like
a normal distribution. If X has a mean p and standard deviation a,then, as n becomes large, we get

Sum

%

Normal(n * p , f i * a)

which is rather nice because it means we can have a distribution like the relative distribution and
determine the moments (ModelRisk function VoseMoments will do this automatically for you), or just
the mean and standard deviation of relevant observations of X and use them. It also explains why the
distributions in the right-hand column of Table 11.3 often look approximately normal.
When none of these identities applies, we have to simulate a column of X variables of length n and
add them up, which is usually not too onerous in computing time or spreadsheet size because if n is
large we can usually use the central limit theorem approximation.

1

302

Risk Analysis

Table 11.1 Variables and their aggregate distribution.
X

N

Aggregate distribution
Total receipts in a year
Bacteria in my three-raw-egg
milkshake
Total credit default exposure
Total financial exposure of
insurance company

Purchase of each customer
Bacteria in a contaminated
egg
Amount owed by a creditor
Amount due on death for a
policyholder

Customers in a year
Contaminated egg
Credit defaults
Life insurance holders who
die next year

Table 11.2 Different situations where aggregate distributions are needed.
Situation

N

X

A
B
C
D
E
F

Fixed value
Fixed value
Fixed value
Random variable
Random variable
Random variable

Fixed value
Random variable, all n take same value
Random variable, all n take different values (iids)
Fixed value
Random variable, all n take same value
Random variable, all n take different values (iids)

Table 11.3 Known identities for aggregate
distributions.
Aggregate distribution
X
Bernoulli@)
Binomial(n, p)
BetaBinomial(m, a, p) BetaBinomial(n * m, a, p)
Binomial(n * m,p)
Binomial(m, p)
n * Cauchy(a,b)
Cauchy(a,b)
ChiSq(v)
ChiSq(n * u)
Erlang(n * m, j3)
Erlang(m, p)
Gamma(n. B)
Ex~onentiallB)

An alternative for situation C available in ModelRisk is to use the VoseAggregateMC(n, distribution)
function; for example, if we write

the function will generate and add together 1000 independent random samples from a lognonnal(2, 6)
distribution. However, were we to write

Chapter I I Sums of random variables

the function would generate a single value from a Gamma(2
identities in Table 11.3 are programmed into the function.

*

303

1000, 6) distribution because all of the

Situation F

This leaves us with situation F - the sum of a random number of random variables. The most basic
simulation method is to produce a model where a value for n is generated in one spreadsheet cell and
then a column of X variables is created that varies in size according to the value on n (see, for example,
Figure 11.1).
In this model, n is a Poisson(l2) random being generated at cell C2. The Lognormal(100, 10) X
values are generated in column C only if the count value in column B is smaller than or equal to n.
For example, in the iteration shown, a value of 14 is generated for n , so 14 X values are generated in
column C.
The method is quite generally applicable, but among other problems is inefficient. Imagine if n had
been Poisson(l0 OOO), for example - we would need huge B and C columns to make the model work.
It is also difficult from a modelling perspective because the model has to be written for a specific range
of n . One cannot simply change the parameter in the Poisson distribution.
We have a couple of options based on the techniques described above for situation C. If we are
adding together X variables shown in Table 11.3, then we can apply those identities by simulating n
A )

B

I

c

IDI

E

I

F

I

G

I

H

I

1

2

n:

14

4

Count
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

X value
104.2002
99.24762
104.939
103.0028
97.13033
137.6911
119.2818
111.4683
101.3274
102.6788
107.0966
96.36928
93.28309
101.6922
0
0
0
0
0
0

3

5
6
7

8
9
&
11

2
13
14

15
2
1
7
18
19
2
21

22
23

24
25 .

Total

C2:
C5:C24
F6 (output)

Formulae table
=VosePoisson(l2)
=IF(B5~$C$2,0,VoseLognormal(l00,10))
=SUM(C5:C24)

Figure 11.1 Model for the sum of a random number of random variables.

I

IJ

304

Risk Analysis

in one cell and linking that to a cell that simulates from the aggregate variable conditioned on n. For
example, imagine we are summing Poisson(100) X variables where each X variable takes a Gamma(2, 6)
distribution. Then we can write
Cell A1 : = VosePoisson(100)
Cell A2(output) : = VoseGamma(A1 * 2,6)
We can also use the central limit theorem method. Imagine we have n = Poisson(1000) and X =
Beta4(3, 7, 0, 8), which is illustrated in Figure 11.2.
The distribution is not terribly asymmetric, so adding roughly 1000 of them will look very close
to a normal distribution, which means that we can be confident in applying the central limit theorem
approximation, shown in the model of Figure 11.3.
Here we have made use of the VoseMoments array function which returns the moments of a distribution object. Most software, however, will allow you at least to view the moments of a distribution,
and, if not, you can simulate the distribution on its own and empirically determine its moments from
the values or, if you need greater accuracy or speed, apply the equations given in the distribution compendium in Appendix 111. The VoseCLTSum performs the same calculation as that shown in F5 but is a
little more intuitive. Alternatively, the VoseAggregateMC will, in this iteration, add together 957 values
drawn from the Beta4 distribution because there is no known identity for sums of Beta4 distributions.

Figure 11.2 A Beta4(3, 7,0,8)distribution.

Chapter I I Sums of random variables

A

1

B

I

I

C

I

D

E

I

F

305

IG

957
VoseBeta4(3,7,0,8)

4
[ ~ g g r e ~ a distribution:
te
1.221818182
0.48249791
2.860805861

9
10
11
12
13
14
15
16
17

-

c2
c3
{B5:C8)
F5 (output):
F6 (alternative):
F7 (alternative):

or
or

2345.4

(

2303.4
2318.1

Formulae table
=VosePoisson(l000)
=VoseBeta40bject(3,7,0,8)
{=VoseMoments(C3)}
=VoseNormal(C2*C5,SQRT(C2*C6))
=VoseCLTSum(C2,C5,SQRT(C6))
=VoseAggregateMC(C2,C3)

Figure 11.3 Model for the central limit theorem approximation.

I 1.2 Aggregate Distributions
I 1.2. I Moments of an aggregate distribution
There are general formulae for determining the moments of an aggregate distribution given that one has
the moments for the frequency distribution for n and the severity distribution for X. If the frequency
distribution has mean, variance and skewness of p ~ V,F and SF respectively, and the severity distribution
has mean, variance and skewness of p c , Vc and Sc respectively, then the aggregate distribution has
the following moments:
Mean = p ~ p c
(11.1)
Variance = p~ VC
Skewness =

+ VFpC2

pFsCv;l2

(1 1.2)

+ ~ V F ~ C +V sC ~ v : ~ ~ ~ ~
(Variance)3/2

There is also a formula for kurtosis, but it is rather ugly. The ModelRisk function VoseAggregateMoments determines the first four moments of an aggregate distribution for any frequency and severity
distribution, even if they are bounded and/or shifted.
Equations (1 1.1) to (1 1.3) deserve a little more exploration. Firstly, let's consider the situation where
n is a fixed value, so p~ = n , VF = 0 and SF = undefined. Then we have moments for the aggregate
distribution of
Mean = n p c
Variance = n Vc
sc
Skewness = -

fi

You can see that this gives support to the central limit theorem which states that, if n is large enough,
the aggregate distribution approaches a normal distribution with mean = n p c and variance = n Vc. The
skewness equation shows that the aggregate skewness is proportional to the skewness of X but decreases
rapidly at first with increasing n, then more slowly, and asymptotically towards zero.
Another interesting example is to consider the aggregate moment equations when n follows a
Poisson(A) distribution, which is very commonly the most appropriate distribution for n, and also has
the convenience of being described by just one parameter. Now we have p~ = h, VF = A and SF =
and the aggregate moments are

h,

Mean = ApC

+ w;)
(sC v;l2 + 3wc Vc + p;)
Skewness =
3
(Vc + p ; ) m
Variance = A (Vc

The mean and variance equations are simple formulae. We can see that the skewness decreases with

1in the same way as it does for a fixed value for n. If X is symmetrically distributed, then, for a

4

given A, the skewness is at its maximum when the mean and standard deviation of X are the same, and
at its lowest when the standard deviation is very high. Thus, the aggregate distribution will be more
closely normal when Vc is large.
Being able to determine the aggregate moments is pretty useful. One can directly compare sums
of random variables, which I will discuss more in Chapter 21. One can also match these moments to
some parametric distribution and use that as an approximation to the aggregate distribution. An aggregate
distribution is almost always right skewed, so we can select from a number of right-skewed distributions
like the lognormal and gamma and match moments. For example, a Gamma(a, p) distribution shifted
by a value T has

+T

Mean = a/3

Variance = ap

2

(1 1.4)
(1 1.5)

2
Skewness = -

fi

Thus, matching skewness gives us a value for a. Then, matching variance gives us /3, and, finally,
matching mean gives us T. Adding a shift gives us three parameters to estimate, so we can match three
moments. The model in Figure 11.4 offers an example.
Cells C3:C5 are the parameters for the model. Cells D3 and D4 use ModelRisk functions to create
distribution objects. B8:Cll and D8:Ell use the VoseMoments function to calculate the moments of the
two distributions. Alternatively, you can use the equations in the distribution compendium in Appendix
111. F8:FlO manually calculates the first three aggregate moments from Equations (1 1.1) to (11.3),
and G8:Hll calculates all four using the VoseAggregateMoments function as a check. In C15:C17,
Equations (1 1.4) to (1 1.6) are inverted to determine the gamma distribution parameters. Finally, G14:H17
uses the VoseMoments function again to determine the moments of the gamma distribution. You can see
that they match the mean, variance and skewness of the aggregate distribution - as they should - but
also that the kurtosis is very close, so the gamma distribution would likely be a good substitute for
the aggregate distribution. To be sure, we would need to plot the two together, which we'll look

Chapter I I Sums of random variables

Aggregate
25 Mean
25 Variance
0.2 Skewness
3.04 Kurtosis

307

VoseAMoments

1850 Variance
6.944 1.018515312 Skewness
75.1 056
Kurtosis

1.018515
4.723443

=C8*E9+CYE8A2

Figure 11.4

Model for determining aggregate moments.

at later: a feature in ModelRisk uses the matching moments principle to match shifted versions of
the gamma, inverse gamma, lognormal, Pearson5, Pearson6 and fatigue distributions to constructed
aggregate distributions and overlay the distributions for an extra visual comparison.

1 1.2.2 Methods for constructing an aggregate distribution
In this section I want to turn to a range of very neat techniques for constructing the aggregate distribution
when n is a random variable and X are independent identically distributed random variables. There are
a lot of advantages to being able to construct such an aggregate distribution, among which are:
a
a

We can determine tail probabilities to a high precision.
It is much faster than Monte Carlo simulation.
We can manipulate the aggregate distribution as with any other in Monte Carlo simulation, e.g.
correlate it with other variables.

The main disadvantage to these methods is that they are computationally intensive and need to run
calculations through often very long arrays. This makes them impractical to show in a spreadsheet

308

Risk Analysis

environment, so I will only describe the theory here. All methods are implemented in ModelRisk,
however, which runs the calculations internally in C++.
We start by loolung at the Panjer recursive method, and then the fast Fourier transform (FFT) method.
These two have a similar feel to them, and similar applications, although their mathematics is quite
different. Then we'll look at a multivariate FFT method that allows us to extend the aggregate calculation
to a set of {n, X ] variables. The de Pril recursive method is similar to Panjer's and has specific use.
Finally, I give a summary of these methods and when and why they are useful.
Panjer's recursive method

Panjer's recursive method (Panjer, 1981; Panjer and Willmot, 1992) applies where the number of variables n being added together follows one of these distributions:
binomial;
geometric;
negative binomial;
Poisson;
P6lya.
The technique begins by talung the claim size distribution and discretising it into a number of values
with increment C. Then the probability is redistributed so that the discretised claim distribution has the
same mean as the continuous variable. There are a few ways of doing this, but if the discretisation steps
are small they give essentially the same answer. A simple method is to assign the value (i * C) the
probability si as follows:

In the discretisation process we have to decide on a maximum value of i (called r) so we don't have an
infinite number of calculations to perform. Now comes the clever part. The above discrete distributions
lead to a simple one-time summation through a recursive formula to calculate the probability p ( j ) that
the aggregate distribution will equal j * C:

The formula works for all frequency distributions for n that are of the (a, b, 0) class, which means that,
from P(n = 0) up, we have a recursive relationship between P(n = i ) and P(n = i - 1) of the form

a and b are fixed values that depend on which of the discrete distributions is used and their parameter
value. The specific formula for each case is given below for the (a, b, 0) class of discrete distributions:

For the Binomial(n,p)

Chapter I I Sums of random variables

309

For the Geometric(p)

For the NegBin(s,p)

For the Poisson(h)
PO=exp[h.so - h ] , a = 0, b = h

For the P6lya(cy,B)

The output of the algorithm is two arrays {i},{p(i)} that can be constructed into a distribution, for
example as VoseDiscrete({i},p{i)) * C . Panjer's method can occasionally numerically "blow up" with
the binomial distribution, but when it does so it generates negative probabilities, so is immediately
obvious.
A small change to Panjer's algorithm allows the formula to be applied to (a, b, 1) distributions, which
means that the recursive formula (11.8) works from P ( n = 1) onwards. This allows us to include the
logarithmic distribution using the formulae

Panjer's method cannot, however, be applied to the Delaporte distribution. Panjer's method requires
a bit of hands-on management because one has to experiment with the maximum value r to ensure
sufficient coverage and accuracy of the distribution. ModelRisk uses two controls for this: MaxP specifies
the upper percentile value of the distribution of X at which the algorithm will stop, and Intervals specifies
how many steps will be used in the discretisation of the X distribution. In general, the larger one makes
Intervals, the more accurate the model will be, but at the expense of computation time. The MaxP value
should be set high enough realistically to cover the distribution of X, but, if one sets it too high for
a long tailed distribution, there will be an insufficient number of increments in the main body of the
distribution. In ModelRisk one can compare the exact moments of the aggregate distribution with those
of the Panjer constructed distribution to ensure that the two correspond with sufficient accuracy for the
analyst's needs.

3 10

Risk Analysis

Fast fourier transform (FFT) method

The density function f (x) of a continuous random variable can always be converted into its Fourier
transform @,(t) (also called its characteristic function) as follows:
max

@,(t) =

1

e"" f (x) dx = ~ [ e " " ]

min

and we can transform back using

min

Characteristic functions are really useful for determining the sums of random variables because
@x+y(t)= @ X ( t )* &~(t),i.e. we just multiply the characteristic functions of variables X and Y to
get the characteristic function of (X + Y). For example, the characteristic function for a normal distribution is @(t) = exp
. Thus, for variables X = Normal(px, ax) and Y = Normal(py, a y )
we have
@x+r(t) = $X(~)@Y
(t) = exp

In this particular example, the function form of @x+y(t) equates to another normal distribution with
mean ( p X wy) and variance (a; a;), so we don't have to apply a transformation back - we can
already recognise the result.
The fast Fourier transform method of constructing an aggregate distribution where there are a random
number n of identically distributed random variables X to be summed is described fully in Robertson
(1 992). The technique involves discretising the severity distribution X like Panjer's method so that one
has two sets of discrete vectors, one each for the frequency and severity distributions. The mathematics
involves complex numbers and is based on the convolution theory of discrete Fourier transforms, which
states that to obtain the aggregate distribution one multiplies the two discrete Fourier transforms of these
vectors pointwise and then computes the inverse discrete Fourier transform. The fast Fourier transform
is used as a very quick method for computing the discrete Fourier transform for long vectors.
The main advantage of the FFT method is that it is not recursive, so, when one has a large array
of possible values, the FFT won't suffer the same error propagation that Panjer's recursion will. The
FFT can also take any discrete distribution for its frequency distribution (and, in principle, any other
non-negative continuous distribution if one discretises it). The FFT can also be started away from zero,
whereas the Panjer method must calculate the probability of every value starting at zero. Thus, as a
rough guide, consider using Panjer's method where the frequency distribution does not take very large
values and where it is one of those for which Panjer's method applies, otherwise use the FFT method.
ModelRisk offers a version of the FFT method with some adjustments to improve efficiency and allow
for a continuous aggregate distribution.

+

+

b

Chapter I I Sums of random variables

31I

FFT methods can also be extended to a group of {n, X ) paired distributions, which ModelRisk makes
available via its VoseAggregateMultiFFT function.
De Pril method

For a portfolio of n independent life insurance policies, each policy y has a particular probability of
a claim p, in some period (usually a year) and benefit By. There are various methods for calculating
the aggregate payout. Dickson (2005) is an excellent (and very readable) review of these methods and
other areas of insurance risk and ruin.
The De Pril method is an exact method for determining the aggregate payout distribution. The compound Poisson approximation discussed next is a faster method that will usually work too.
De Pril (1986) offers an exact calculation of the aggregate distribution under the assumptions that:
The benefits are fixed values rather than random variables and take integer multiples of some
convenient base (e.g. $1000) with a maximum value M * base, i.e. Bi = (1 . . . M ) * base.
The probability of claims can similarly be grouped into a set of J values (i.e. into tranches of
mortality rates) p, = {pl . . . pJ}.
Let nij be the number of policies with benefit i and probability of claim p j . Then De Pril's paper
demonstrates that p(y), the probability that the aggregate payout will be equal to y * base, is given by
the recursive formula

xx

min[y,MI

p(y) = ;

Lylil

p(y - ik)h(i, k) for y = 1 , 2 , 3 . . .

and

where

The formula has the benefit of being exact, but it is very computationally intensive. However, the
number of computations can usually be significantly reduced if one accepts ignoring small aggregate
costs to the insurer. Let K be a positive integer. Then the recursive formulae above are modified as
follows:

Dickson (2005) recommends using a value of 4 for K . The De Pril method can be seen as the
counterpart to Panjer's recursive method for the collective model. ModelRisk offers a set of functions
for implementing De Pril's method.

3I2

Risk Analysis

Compound Poisson approximation

The compound Poisson approximation assumes that the probability of payout for an individual policy
is fairly small - which is usually true, but has the advantage over the De Pril method in allowing that
the payout distribution is a random variable rather than a fixed amount.
Let n j be the number of policies with probability of claim pj. The number of payouts in this stratum
is therefore Binomial(nj, p j ) . If n j is large and p j is small, the binomial is well approximated by a
Poisson(nj * p j ) = Poisson(hi) distribution. The additive property of the Poisson distribution tells us
that the frequency distribution for payouts over all groups of lines of insurance is given by

and the total number of claims = Poisson(ha1).
The probability that one of these claims, randomly selected, comes from stratum j is given by

Let F,(x) be the cumulative distribution function for the claim size of stratum j. The probability that
a random claim is less than or equal to some value is therefore

Thus, we can consider the aggregate distribution for the total claims to have a frequency distribution
equal to Poisson(hdl) and a severity distribution given by F ( x ) .
Adding correlation in aggregate calculations
Simulation

The most common method for determining the aggregate distribution of a number of correlated random
variables is to simulate each random variable in its own spreadsheet cell, using one of the correlation
methods described elsewhere in this book, and then sum them up in another cell. For example, the
model in Figure 11.5 adds together Poisson(100) random variables each following a Lognormal(2, 5 )
distribution but where these variables are correlated through a Clayton(l0) copula.
Cell C7 determines the 99.99th percentile of the Poisson(100) distribution - a value of 139 - which
is used as a guide to set the maximum number of rows in the table. The Clayton copula values are
used as "U-parameter" inputs into the lognormal distributions, meaning that they make the lognormal
distributions return the percentile equating to the copula value; for example, cell D l 2 returns a value of
2.5539. . ., which is the 80.98. . .th percentile of the Lognormal(2, 5) distribution.

Chapter I I Sums of random variables

A

1

I

B

c

I

D

I

E

I

F

I

G

H

I

I

J

I

K

3 13

I

L

1

(This tells us how large the array below needs to be)

8
9

[ ~ o t a(output)
l

173.5239678

]

10
11
12
3

14
15
145
146
147

148
2
150
151

Number added
1
2
3
4
134
135
136
137
138
139
140

Clayton
cooula
0.809878223
0.698461498
0.715257242
0.8041 17626
0.644750607
0.700918744
0.846351057
0.617433557
0.671806607
0.730271298
0.674805899

Lognormal
variables
2.553939077
1.54420499
1.654062795
2.47946512
0
0
0
0
0
0
0

Formulae table
=VosePoisson(C2)
=VosePoisson(C2,0.9999)

=SUM(D12:Dl51)

152

Figure 11.5 Model for simulating the aggregate distribution of correlated random variables

A Clayton copula provides a particularly high level of correlation of the variables at their low end.
For example, the plot in Figure 11.6 shows the level of correlation of two variables with a Clayton(] 0).
Thus, the model will produce a wider range for the sum than an uncorrelated set of variables but
in particular will produce more extreme low-end values from a probabilistic view (the correlated sum
has about a 70 % probability of taking a lower value than the uncorrelated sum). The use of one of
the Archimedean copulas is an appropriate tool here because we are adding up a random number of
these variables but the number being summed does not affect the copula's behaviour - all variables will
be related to the same degree no matter how many are being summed. The effect of the correlation is
readily observed by repeating the model without any correlation. The plot in Figure 11.7 compares the
two cumulative distributions.
Complete correlation

In the situation where the source of the randomness or uncertainty of the distribution associated with
a random variable is the same for the whole group you are adding up, there is really just one random
variable. For example, imagine that a railway network company must purchase 127 000 sleepers (the
beams under the rails) next year. The sleepers will be made of wood, but the price is uncertain because
the cost of timber may fluctuate. It is estimated that the cost will be $PERT(22.1, 22.7, 33.4) each. If
all the timber is being purchased at the same time, it might be reasonable to believe that all the sleepers
will have the same price. In that case, the total cost can be modelled simply:

If there are a large number n of random variables Xi(i = 1, . . . , n) being summed and the uncertainty of
the sum is not dominated by a few of these distributions, the sum is approximately normally distributed

3 14

Risk Analysis

Figure 11.6 Correlation of two variables with a Clayton(l0).

Comparison of correlated and uncorrelated sums
1
0.9
%

0.8

2 0.7
CI

m

0.6

h
a
.->

0.5

5

0.3

-53

0.4

0
0.2
0.1
0
0

100

200

300

400

500

600

Value of sum

Figure 11.7 Comparison of correlated and uncorrelated sums.

700

800

900

1000

Chapter I I Sums of random var~ables 3 I 5

according to the central limit theory as follows:

The equation states that the aggregate sum takes a normal distribution with a mean equal to the sum
of the means for the individual distributions being added together. It also states that the variance (the
square of the standard deviation in the formula) of the normal distribution is equal to the square of the
covariance terms between each variable. The covariance terms a i j are calculated as follows:

where ai and a j are the standard deviations of variables i and j, pij is the correlation coefficient and
E [ . ] means "the expected value of' the thing in the brackets.
If we have datasets for the variables being modelled, Excel can calculate the covariance and correlation
coefficients using the functions COVARO and CORREL() respectively. If we were thinking of using a
rank order correlation matrix, each element corresponds reasonably accurately to pij for roughly normal
distributions (at least, not very heavily skewed distributions), so the standard deviation of the normally
distributed sum could be calculated directly from the correlation matrix.
Correlating partial sums

We will sometimes be in the situation of having two or more sums of random variables that have some
correlation relationship between them. For example, imagine that you are a hospital trying to forecast the
number of patient-days you will need to provide next year, and you split the patients into three groups:
surgery, maternity and chronic illness (e.g. cancer). Let's say that the distribution of days that a person
will spend in hospital under each category is independent of the other categories, but the number of individuals being treated is correlated with the number of people in the catchment area, which is uncertain
because hospital catchments are being redefined in your area. There are plenty of ways to model this problem, but perhaps the most convenient is to start with the uncertainty of size of the number of people in
the catchment area and derive what the demand will be for each type of care as a consequence, then make
a projection of what the total patient-days might be as a result, as shown in the model in Figure 11.8.
In this model the uncertainty about the catchment area population is modelled with a PERT distribution, the bed-days for each category of healthcare are modelled by lognormal distributions with
different parameters and the number of patients in each category is modelled with a Poisson distribution
with a mean equal to (population size-000s) * (expected cases/year/1000 people). I have shown three
different methods for simulating the aggregate distribution in each class: pure Monte Carlo for surgery;
FFT for maternity and Panjer's recursive method for chronic. Any of the three could be used to model
each category. You'll notice that the Monte Carlo method is slightly different from the others in that
I've used VosePoisson(. . .) instead of VosePoissonObject(. . .) because the VoseAggregateMC function
requires a numerical input for how many variables to sum (allowing the flexibility that this could be a
calculated value), whereas the FFT and Panjer methods perform calculations on the Poisson distribution
and therefore need it to be defined as an object. Note that the same model could be achieved with other
Monte Carlo simulation software by making randomly varying arrays for each category, the technique
illustrated in Figure 11.l, but the numbers in this problem would require very long arrays.
Using the same basic problem, let us now consider the situation where the frequency distribution for
each category is correlated in some fashion, as we had before, but not because of their direct relationship

3 16

Risk Analysis

A

1

B

C

D

E

I

F

I

F

1
2

128.47 thousand

Predicted population size next year

3
Number of patients by category
Maternity

4

5
6

Surgery
Expected/year/1000residents
Number treated next year
Bed-days for a random patient
Total bed-days

7

8

9
10
11
-

Chronic

14.7
27.4
184
VosePoisson(3519.962137)
23477 VosePoisson(1888.44684)
VoseLognormal(43.28)
VoseLognormal(4.1,2.5)
VoseLognormal(6.3,36.7)
4.880
147.175
15,740

l ~ o t abed-days
l
over all categories

167,795

1

12

13
2
15
2

Formulae table
=VosePERT(82,107,163)
=VosePoisson(CG'$C$2)
=VosePoissonObject(D6'$C$2)
=VoseLognormalObject(6.3,36.7)
=VoseAggregateMC(C7,C8)
=VoseAggregateFFT($D$7,$D$8,,)
=VoseAggregatePanjer(E7,E8,20OOO.999)
=SUM(C9:E9)

C2
C7
D7:E7
C8:E8 (with different values)
C9
D9
E9
C11 (output)

17

2
20
21
22

Figure 11.8 Model for forecasting the number of patient-days in a hospital.
A

1
2

1

B

I

I

C

D

E

Maternity
-0.3
1
-0.25

Chronic
0.2
-0.25
1

107.00 thousand

Predicted population size next year

3
4

5
6

7
-

8
--

9

10
2

Correlation matrix

Normal copula

2
2
14
15
16

1
7
18

Surgery
1
-0.3
0.2

Surgery
Maternity
Chronic

I

0.441

0.745

Number of patients b y category
Chronic
Maternity
27.4
14.7
184
2,967
1,628
19,667
VoseLognormal(43.28)
VoseLognormal(4.1,2.5)
VoseLognormal(6.3,36.7)
127.904
6.715
120.831
Suraerv

Expected/year/1000residents
Number treated next year
Bed-days for a random patient
Total bed-days

2

20
-

0.918

l ~ o t abed-days
l
over all categories

255,450

1

21
22

23
24

25
26
27

{CIl:E11)
C16:E16
C17:E17 (with different values)
C18:E18
C2O (output)

Formulae table
(=VoseCopulaMultiNormal(C7:E9))
=VosePoisson(Cl S$D$2.C11)
=VoseLognormalObject(6.3.36.7)
=VoseAggregateMC(C16,Cl7)
=SUM(C18:E18)

28

Figure 11.9 Using a normal copula to correlate the Poisson frequency distributions.

to any observable variable. Imagine that the population size is known, but we want to model the effects
of increased pollution in the area, so we want the surgery and chronic Poisson variables to be positively
correlated with each other but negatively correlated with maternity. The following model uses a normal
copula to correlate the Poisson frequency distributions (Figure 11.9).

Chapter I I Sums of random var~ables 3 17

There is in fact an FFT method to achieve this correlation between frequency distributions, but the
algorithm is not particularly stable.
Turning now to the severity (length of hospital stay) distributions, we may wish to correlate the length
of stay for all individuals in a certain category. In the above model, this can be achieved by creating a
separate scaling variable for each lognormal distribution with a mean of 1, for example a
h2)
distribution with the required mean and a standard deviation of h (Figure 11.10). Note that this means
that the lognormal distributions will no longer have the standard deviations they were given before.
Finally, let's consider how to correlate the aggregate distributions themselves. We can construct the
distribution of the number of bed-days required for each type of healthcare using either the FFT or Panjer
method. Since the distribution is constructed rather than simulated, we can easily correlate the aggregate
distributions by controlling how they are sampled. In the example in Figure 11.11, the model uses the
FFT method to construct the aggregate variables and correlates them together using a Frank copula.

amm ma($,

1

Predicted population size next year

Number of oatients by cateqow
Maternity
Chronic
14.7
27.4
1,610
2,874
0.15
0.3
0.8740
0.4937
VoseLognormal(5.51,32.08)
VoseLognormal(3.11.18.12)
9,602
8.996

Suraerv
Expected/year/IOOOresidents
Numbertreated next year
Scaling variable stdev (h)
Hospital days scaling variable
Bed-days for a random patient
Total bed-davs

I

107.00 thousand

184
19,858
0.2
1.0267
VoseLognormal(6.47,37.68)
129,922

148,520

ITotal bed-days over all categories

1

Figure 11.10 Creating separate scaling variables for each lognormal distribution.
A

1
2

1

B

I

D

C

Predicted population size next year

107.00 thousand
Number of patients by category
Chronic
Maternity
27.4
14.7
184
VosePoisson(l572.9)
VosePoisson(l9688)
VosePoisson(2931.8)
VoseLognormal(6.3,36.7)
VoseLognormal(4.1,2.5)
VoseLognormal(43,28)
0.6284
0.6507
0.5676
7,746
7,712
7,758
Surgery

Expected/year/1000residents
Number treated next year
Bed-days for a random patient
Frank copula
Total bed-davs

.-

110tal bed-days over all categories

23,216

C7:E7
C8:E8 (with different values)
(C9:Eg)
C10:ElO
C12 (output)

Figure 11.11

E

1
Formulae table
=VosePoissonObject(CG'$C$2)
=VoseLognormalObject(6.3.36.7)
{=VoseCopulaMultiFrank(15))
=VoseAggregateFFT($D$7.$D$8,,C9)
=SUM(ClO:EIO)

Using the FFT method to combine correlated aggregate variables.

I

F

Distribution of n

Figure 11.12 Model that calculates the distribution for n.

1 1.2.3 Number of variables to reach a total
So far in this chapter we have focused on determining the distribution of the sum of a (usually random)
number of random variables. We are also often interested in the reverse question: how many random
variables will it take to exceed some total? For example, we might want to answer the following
questions:
How many random people entering a lift will it take to exceed the maximum load allowed?
How many sales will a company need to make to reach its year-end target?
How many random exposures to a chemical will it take to reach the exposure limit?

Chapter I I Sums o f random variables

3 17

There is in fact an FFT method to achieve this correlation between frequency distributions, but the
algorithm is not particularly stable.
Turning now to the severity (length of hospital stay) distributions, we may wish to correlate the length
of stay for all individuals in a certain category. In the above model, this can be achieved by creating a
separate scaling variable for each lognormal distribution with a mean of 1, for example a ~ a m m a ( h h, 2 )
distribution with the required mean and a standard deviation of h (Figure 11.10). Note that this means
that the lognormal distributions will no longer have the standard deviations they were given before.
Finally, let's consider how to correlate the aggregate distributions themselves. We can construct the
distribution of the number of bed-days required for each type of healthcare using either the FFT or Panjer
method. Since the distribution is constructed rather than simulated, we can easily correlate the aggregate
distributions by controlling how they are sampled. In the example in Figure 11.11, the model uses the
FFT method to construct the aggregate variables and correlates them together using a Frank copula.
A

1

1
2

I

B

I

C

7
-

8

9
2

184
19,858
0.2
1.0267
VoseLognormal(6.47.37.68)
129.922

12

1
3
-

( ~ o t abed-days
l
over all categories

2

F

Number of patients by category
Maternitv
Chronic
14.7
27.4
1,610
2,874
0.15
0.3
0.8740
0.4937
VoseLognormal(3.11.18.12)
VoseLognormal(5.51,32.08)
8,996
9,602

Surqew
Expected/year/1000residents
Number treated next year
Scaling variable stdev (h)
Hospital days scaling variable
Bed-days for a random patient
Total bed-days

I

E

107.00 thousand

Predicted population size next year

p
4
6
--

I

D

148,520

1

1
Formulae table
C7:E7
=VosePoisson(C6*$C$2)
C9:E9
=VoseGamma(CB/I-2,CW2)
C10:ElO (with different values) =VoseLognormalObject(6.3'C9,36.7'C9)
C11:Ell
=VoseAggregateMC(C7,CI0)
=SUM(Cll:Ell)
C13 (output)

16
1
7
2
2
20
,21

Figure 11.10 Creating separate scaling variables for each lognormal distribution.
A

1
2

1

B

I

C

D

I

E

107.00 thousand

Predicted population size next year

3
4

Number of patients by category
Maternity
Chronic
27.4
14.7
184
VosePoisson(l572.9)
VosePoisson(2931.8)
VosePo1sson(l9688)
VoseLognormal(43.28)
VoseLognormal(6.3,36.7)
VoseLognormal(4.1,2.5)
0.5676
0.6284
0.6507
7,712
7.746
7.758
Surgery

6
-

7
8
9
2

Expected/year/1000residents
Number treated next year
Bed-days for a random patient
Frank copula
Total bed-davs

11

3
J
-

14
2
16
1
7
2
19

C7:E7
C8:E8 (with different values)
(C9:Eg)
C10:ElO
C12 (output)

Formulae table
=VosePoissonObject(C6*$C$2)
=VoseLognormalObject(6.3,36.7)
(=VoseCopulaMultiFrank(l5))
=VoseAggregateFFT($D$7,$D$8,.C9)
=SUM(ClO:ElO)

20

Figure 11.11 Using the FFT method to combine correlated aggregate variables.

I

F

3 16

R~skAnalysis

A

1

B

I

C

E

D

I

F

I

F

1

2
3

Predicted population size next year

128.47 thousand
Number of patients b y category
Chronic
Maternity
184
27.4
14.7
23477 VosePoisson(1888.44684)
VosePoisson(3519.962137)
VoseLognorrnal(6.3,36.7)
VoseLognorrnal(4.1.2.5)
VoseLognormal(43.28)
147.175
15.740
4,880

4

5

Surgery

6

Expected/year/1000residents
Number treated next year
Bed-days for a random patient
Total bed-days

7
8
--

9
10
$1
-

l ~ o t abed-days
l
over all categories

167,795

1

2

Formulae table
=VosePERT(82,107,163)
=VosePoisson(CG'$C$2)
=VosePoissonObject(D6'$C$2)
=VoseLognormalObject(6.3,36.7)
=VoseAggregateMC(C7,C8)
=VoseAggregateFFT($D$7.$D$8,,)
=VoseAggregatePanjer(E7.E8,200,0.999)
=SUM(C9:E9)

2
C2
C7
D7:E7
C8:E8 (with different values)
C9
D9
E9
C l l (output)

14

15
16
17

18
19
20
21
22

Figure 11.8 Model for forecasting the number of patient-days in a hospital.
A

1

B

I

I

C

D

E

Maternity
-0.3
1
-0.25

Chronic
0.2
-0.25
1

1

2
3
4
5

6
7
8

9
2
11
12
13
14

3
3

1
7
18

Predicted population size next year

107.00 thousand
Correlation matrix
Surgery
1
-0.3
0.2

Surgery
Maternity
Chronic
Normal copula

1

0.441

0.745 (

Number of patients by cateqory
Maternitv
Chronic
184
14.7
27.4
19,667
1,628
2,967
VoseLognormal(6.3,36.7)
VoseLognormal(4.1,2.5)
VoseLognormal(43,28)
120,831
6,715
127,904
Suraerv

Expected/year/1000residents
Number treated next year
Bed-days for a random patlent
Total bed-davs

2

20
-

0.918

l ~ o t abed-days
l
over all categories

255,450

1

21
22

23
24

25
26
27

(C11:Ell)
C16:E16
C17:E17 (with different values)
C18:E18
C20 (output)

Formulae table
(=VoseCopulaMultiNormal(C7:E9))

=VosePoisson(ClY$D$2,Cll)
=VoseLognormalObject(6.3,36.7)
=VoseAggregateMC(C16,Cl7)
=SUM(C18:E18)

28

Figure 11.9 Using a normal copula to correlate the Poisson frequency distributions.

to any observable variable. Imagine that the population size is known, but we want to model the effects
of increased pollution in the area, so we want the surgery and chronic Poisson variables to be positively
correlated with each other but negatively correlated with maternity. The following model uses a normal
copula to correlate the Poisson frequency distributions (Figure 11.9).

3 18

R~skAnalysis

Distribution of n

Figure 11.12 Model that calculates the distribution for n.

1 1.2.3 Number of variables to reach a total
So far in this chapter we have focused on determining the distribution of the sum of a (usually random)
number of random variables. We are also often interested in the reverse question: how many random
variables will it take to exceed some total? For example, we might want to answer the following
questions:
How many random people entering a lift will it take to exceed the maximum load allowed?
How many sales will a company need to make to reach its year-end target?
How many random exposures to a chemical will it take to reach the exposure limit?

--

- -

Chapter I I Sums of random var~ables 3 19

Some questions like this are directly answered by known distributions, for example the negative
binomial, beta-negative binomial and inverse hypergeometric describe how many trials will be needed
to achieve s successes for the binomial, beta-binomial and hypergeometric processes respectively.
However, if the random variables are not 0 or 1 but are continuous distributions, there are no distributions
available that are directly useful.
The most general method is to use Monte Carlo simulation with a loop that consecutively adds a
random sample from the distribution in question until the required sum is produced. ModelRisk offers
such a function called VoseStopSum(Distribution, Threshold). This can, however, be quite computationally intensive when the required number is large, so it would be useful to have some quicker methods
available.
Table 11.3 gives us some identities that we can use. For example, the sum of n independent variables
following a Gamma(a, B) distribution is equal to a Gamma(n * a , B). If we require a total of at least T,
then the probability that (n - 1) Gamma(a, j3) variables will exceed T is 1 - F[,-l)(T), where F(,-,)(T)
is the cumulative probability for a Gamma((n - 1) * a, B). Excel has the GAMMADIST function
which calculates F(x) for a gamma distribution (ModelRisk has the function VoseGammaProb which
performs the same task but without the errors GAMMADIST sometimes produces). The probability
that n variables will exceed T is given by 1 - Fn(T). Thus, the probability that it was the nth random
variable that took the sum over the threshold is (1 - F,(T)) - (1 - F(,-l)(T)) = F(n-I)(T) - F,(T).
You can therefore construct a model that calculates the distribution for n directly, as shown in the
spreadsheet in Figure 11.12.
The same idea can be applied with the Cauchy, chi-square, Erlang, exponential, Levy, normal and
Student distributions. The VoseStopSum function in ModelRisk implements this shortcut automatically.

A

Chapter

Forecasting with uncertainty
STUDIES HAVE SHOWN
STOCKS BETTER T H A N

THAT'S WHY THE
DOGOERT MUTUAL FUND
EMPLOYS ONLY MONKEYS.

OUnited Feature Syndicate, Inc. Syndicated by Bruno Productions B.V. Reproduced by permission.
This chapter looks at several forecasting methods in common use and how variability and uncertainty
can be incorporated into their forecasts. Time series modelling is usually based on extrapolating a set
of observations from the past, or, where data are not available or inadequate, the modelling focuses on
expert opinion of how the variable may behave in the future. In this chapter we will look first of all
at the more formal techniques of time series modelling based on past observations, then look at some
ways that the reader may find useful to model expert opinion of what the future holds.
The prerequisites of formal quantitative forecasting techniques are that a reliable time series of past
observations is available and that it is believed that the factors determining the patterns exhibited in
that time series are likely to continue to exist, or, if not, that we can determine the effect of changes in
these factors. We begin by discussing ways of measuring the performance of a forecasting technique.
Then we look at the nalve forecast, which is simply repeating the last, deseasonalised, value in the
available time series. This simplistic forecasting technique is useful for providing a benchmark against
which the performance of the other techniques can be compared. This is followed by a look at various
forecasting techniques, divided into three sections according to the length of the period that is to be
forecast. Finally, we will look at a couple of examples of a different approach that aim at modelling
the variability based on a reasonable theoretical model of the actual system.
There are a few useful basic tips I recommend when you are producing a stochastic time series as
part of your risk analysis:
Check the model's behaviour with imbedded Excel x-y scatter plots.
Split the model up into components rather than create long, complicated formulae. That way you'll
see that each component is working correctly, and therefore have confidence in the time series
projection as a whole.

Figure 12.1 Six plots from the same geometric Brownian motion model. Each pattern could easily be what
follows on from any other pattern.

Be realistic about the match between historic patterns and projections. For example, write a simple geometric Brownian motion model, plot the series and hit the F9 key (recalculate) a few
times and see the variation in patterns you get. Remember that these all come from the same
stochastic model - but they will often look convincingly different (see Figure 12.1): if any of these
had been our historical data, a statistical analysis of the data would have tended to agree with
you and reinforced any preconception about the appropriate model, because statistical analysis
requires you to specify the model to test. So, don't always go for a forecast model because it
fits the data the best - also look at whether there is a logical reason for choosing one model over
another.
Be creative. Short-term forecasts (say 20-30% of the historic period for which you have good
data) are often adequately produced from a statistical analysis of your data. Even then, be selective
about the model. However, beyond that timeframe we move into crystal ball gazing. Including your
perceptions of where the future may go, possible influencing events, etc., will be just as valid as an
extrapolation of historic data.

12. I The Properties of a Time Series Forecast
When producing a risk analysis model that forecasts some variable over time, I recommend you go
through a list of several properties that variable might exhibit over time, as this will help you both
statistically analyse any past data you have and select the most appropriate model to use. The properties
are: trend, randomness, seasonality, cyclicity or shocks and constraints.

Chapter 12 Forecasting with uncertainty

Variable

323

Variable

120

160
140

100

120
80

100

60

80
60

40

40

20
0

20
0

5

10

15

20
25
30
Time period

35

40

45

50

0

0

5

10

15

20
25
30
Time period

35

40

45

50

Figure 12.2 Examples of expected value trend over time.

12.1.1 Trend
Most variables we model have a general direction in which they have been moving, or in which we
believe they will move in the future. The four plots in Figure 12.2 give some examples of the expected
value of a variable over time: top left - a steady relative decrease, such as one might expect for sales
of an old technology, or the number of individuals remaining alive from a group; top right - a steady
(straight-line) increase, such as is often assumed for financial returns over a reasonably short period
(sometimes called "drift"); bottom left - a steady relative increase, such as bacterial growth or take-up
of new technology; and bottom right - a drop turning into an increase, such as the rate of component
failures over time (like the bathtub curve in reliability modelling) or advertising expenditure (more at a
launch, then lower, then ramping up to offset reduced sales).
!

,

12.1.2 Randomness
The second most important property is randomness. The four plots in Figure 12.3 give some examples
of the different types of randomness: top left - a relatively small and constant level of randomness that
doesn't hide the underlying trend; top right - a relatively large and constant level of randomness that
can disguise the underlying trend; bottom left - a steadily increasing randomness, which one typically
sees in forecasting (care needs to be taken to ensure that the extreme values don't become unrealistic);
and bottom right - levels of randomness that vary seasonally.

324

Risk Analysis

Variable

160 140 120 100 80 60 -

i

40 20 0

1

7

0

5

10

15

20

25

30

35

40

45

1
50

180
160
140
120
100
80
60
40
20
0

Time period

Variable

Time period

Time period

Figure 12.3 Examples of the behaviour of randomness over time.

12.1.3 Seasonality
Seasonality means a consistent pattern of variation in the expected value (but also sometimes its randomness) of the variable. There can be several overlaying seasonal periods, but we should usually have
a pretty good guess at what the periods of seasonality might be: hour of the day; day of the week; time
of the year (surnmer/winter, for example, or holidays, or end of financial year). The plot in Figure 12.4
shows the effect of two overlaying seasonal periods. The first is weekly with a period of 7, the second
is monthly with a period of 30, which complicates the pattern. Monthly seasonality often occurs with
financial transactions that take place on a certain day of the month: for example, volumes of documents
that a bank's printing facility must produce each day - at the end of the month they have to chum out
bank and credit card statements and get them in the post within some legally defined time.
One difficulty in analysing monthly seasonality from data is that months have different lengths, so one
cannot simply investigate a difference each 30 days, say. Another hurdle in analysing data on variables
with monthly and holiday peaks is that there can be some spread of the effect over 2 or 3 days. For
example, we performed an analysis recently looking at the calls received into a US insurance company's
national call centre to help them optirnise how to staff the centre. We were asked to produce a model
that predicted every 15 minutes for the next 2 weeks, and another model to predict out 6 weeks. We
looked at the patterns by individual state and language (Spanish and English). There was a very obvious
and stable pattern through the day that was constant during the working week, but a different pattern
on Saturday and on Sunday. The pattern was largely the same between states but different between
languages. Holidays like Thanksgiving (the last Thursday of November, so not even a fixed date) were

Chapter 12 Forecasting with uncertainty

325

Variable
120
I

100
80
60
40
20
0
0

10

20

30

40
Time period

50

60

70

80

Figure 12.4 The expected value of a variable with two overlapping seasonal periods.

Variation around Memorial Day

Variation around Thanksgiving Day

-

Figure 12.5 Effect of holidays on daily calls to a call centre. The four lines show the effect on the last
4 years. Zero on the x axis is the day of the holiday.

very interesting: call rates dropped hugely on the holiday to 10 % of the level one would have usually
expected, but were slightly lower than normal the day before (Wednesday), significantly lower the day
after (Friday), a little lower during the following weekend and then significantly higher the following
Monday and Tuesday (presumably because people were catching up on calls they needed to make).
Memorial Day, the last Monday of May, exhibited a similar pattern, as shown in Figure 12.5.
The final models had logic built into them to look for forthcoming holidays and apply these patterns to
forecast expected levels which had a trend by state and a daily seasonality. For the 15-minute models we
also had to take into account the time zone of the state, since all calls from around the US were received

326

Risk Analysis

I

I I

I

Figure 12.6 Two examples of the effect of a cyclicity shock. On the left, the shock produces a sudden and
sustained increase in the variable; on the right, the shock produces a sudden increase that gradually reduces
over time - an exponential distribution is often used to model this reduction.

into one location, which also involved thinking about when states changed their clocks from summer to
winter and little peculiarities like some states having two time zones (Arizona doesn't observe daylight
saving to conserve energy used by air conditioners, etc.).

12.1.4 Cyclicity or shocks
Cyclicity is a confusing term (being rather similar to seasonality) that refers to the effect of obvious
single events on the variable being modelled (Figure 12.6 illustrates two basic forms). For example, the
Hatfield rail crash in the UK on 12 October 2000 was a single event with a long-term effect on the UK
railway network. The accident was caused by the lapsed maintenance of the track which led to "gauge
corner cracking", resulting in the rail separating. Investigators found many more such cracks in the area
and a temporary speed restriction was imposed over very large lengths of track because of fears that
other track might be suffering from the same degradation. The UK network was already at capacity
levels, so slowing down trains resulted in huge delays. The cost of repairs to the undermaintained track
also sent RailTrack, the company managing the network, into administration. In analysing the cause of
train delays for our client, NetworkRail, a not-for-dividend company that took over from RailTrack, we
had to estimate and remove the persistent effect of Hatfield.
Another obvious example is 911 1. Anyone who regularly flies on commercial airlines will have
experienced the extra delays and security checks. The airline industry was also greatly affected, with
several US carriers filing for protection under Chapter 11 of the US Bankruptcy Code, although other
factors also played a part, such as oil price increases and other terrorist attacks (also cyclicity events)
which dissuaded people from going abroad. We performed a study to determine what price should be
charged for parking at a US national airport, part of which included estimating future demand. From
analysing historic data it was evident that the effect of 911 1 on passenger levels was quite immediate,
and, as of 2006, they were only just returning to 2000 levels, where previously there had been consistent
growth in passenger numbers, so levels still remain far below what would have been predicted before
the terrorist attack.
Events like Hatfield and 9/11 are, of course, almost impossible to predict with any confidence.
However, other types of cyclicity event are more predictable. As I write this (20 June 2007), there are
7 days left before Tony Blair steps down as Prime Minister of the UK, which he announced on 10 May,

Chapter 12 Forecasting with uncerta~nty 3 2 7

and Gordon Brown takes over. Newspaper columnists are debating what changes will come about, and,
for people in the know, there are probably some predictable elements.

12.1.5 Constraints
Randomly varying time series projections can quite easily produce extreme values far beyond the range
that the variable might realistically take. There are a number of ways to constrain a model. Mean
reversion, discussed later, will pull a variable back to its mean so that it is far less likely to produce
extreme values. Simple logical bounds like IF(& > 100, 100, St) will constrain a variable to remain at
or below 100, and one can make the constraining parameter (100) a function of time too. The section
describing market modelling below offers some other techniques that are based on more modelling-based
constraints.

12.2 Common Financial Time Series Models
In this section I describe the most commonly used time series for financial models of variables such
as stock prices, exchange rates, interest rates and economic indicators such as producers' price index
(PPI) and gross domestic product (GDP). Although they have been developed for financial markets, I
encourage you to review the ideas and models presented here because they have much wider applications.
Financial time series are considered to vary continuously, even if perhaps we only observe them at
certain moments in time. They are based on stochastic differential equations (SDEs), which are the
most general descriptions of continuously evolving random variables. The problem with SDEs from a
simulation perspective is that they are not always amenable to being exactly converted to algorithms
that will generate random possible observations at specific moments in time, and there are often no
exact methods for estimating their parameters from data. On the other hand, the advantage is that we
have a consistent framework for comparing the time series and there are sometimes analytical solutions
available to us for determining, say, the probability that the variable exceeds some value at a certain point
in time - answers that are useful for pricing derivatives and other financial instruments, for example.
We can get around the problems with a bit of intense computing, as I will explain for each type of time
series.
Financial time series model a variable in one of two forms: the actual price St of the stock (or the
value of a variable such as exchange rate, interest rate, etc., if it is not a stock) at some time t , or its
return (aka its relative change if it is not an investment) rt over a period At, ASIS,. It might seem
that modelling St would be more natural, but in fact modelling the return of the variable is often more
helpful: apart from making the mathematics simpler, it is usually the more fundamental variable. In this
section, I will refer to St when talking specifically about a price, to r, when talking specifically about
a return and to x, when it could be either.
I introduce geometric Brownian motion (GBM) first, as it is the simplest and most common financial
times series, the basis of the Black-Scholes model, etc., and the launching pad for a number of more
advanced models. I have developed the theory a little for GBM, so you get the feel of the thinking, but
keep the theory to a minimum after that, so don't be too put off.
ModelRisk provides facilities (Figure 12.7) to fit andlor model all of the time series described in the
chapter. For financial models, data and forecasts can be either returns or prices, and the fitting algorithms
can automatically include uncertainty about parameter estimates if required.

328

Risk Analysis

Figure 12.7 ModelRisk time series fit window.

12.2.1 Geometric Brownian motion
Consider the formula

It states that the variable's value changes in one unit of time by an amount that is normally distributed
'
. The normal distribution is a good first choice for a lot of variables
with mean p and variance a
because we can think of the model as stating (from the central limit theorem) that the variable x is
being affected additively by many independent random variables. We can iterate the equation to give
us the relationship between x, and x,+z:

and generalise to any time interval T:

This is a rather convenient equation because (a) we keep using normal distributions and (b) we can
make a prediction between any time intervals we choose. The above equation deals with discrete units
of time but can be written in a continuous time form, where we consider any small time interval At:

Chapter 12 Forecasting w ~ t huncerta~nty 329

The SDE equivalent is
dx = p d t + a d z
dz = E&
where dz is the generalised Wiener process, called variously the "perturbation", "innovation" or "error",
and E is a Normal(0, 1) distribution. The notation might seem to be a rather unnecessary complication,
but when you get used to the SDEs they give us the most succinct description of a stochastic time
series. A more general version of Equations (12.2) is

where g and f are two functions. It is really just shorthand for writing
t

t

Equation (12.1) can allow the variable x to take any real value, including negative values, so it would
not be much good at modelling a stock price, interest rate or exchange rate, for example. However, it
has the desirable property of being memoryless, i.e. to make a prediction of the value of x some time
T from now, we only need to know the value of x now, not anything about the path it took to get to
the present value. We can use Equations (12.2) to model the return of a stock:

There is an identity known as It8's lemma which states that for a function F of a stochastic variable
X following an It8 process of the form dx(t) = a(x, t) dt b(x, t) dz we have

+

Choosing F (S) = log[S] together with Equation (12.3) where x = S , a(x, t) = p and b(x, t) = a :

Integrating over time T, we get the relationship between some initial value St and some later value
S+T:

[

St+T = St exp Normal

((

p

-

7

330

Risk Analysis

where r~ is called the log return1 of the stock over the period T. The exp [.] term in Equation (12.6)
means that S is always > 0, so we still retain the memoryless property which corresponds to some
financial thinking that a stock's value encompasses all information available about a stock at the time,
so there should be no memory in the system (I'd argue against that, personally).
The log return r of a stock S is (roughly) the fractional change in the stock's value. For stocks this
is a more interesting value than the stock's actual price because it would be more profitable to own 10
shares in a $1 stock that increased by 6 % over a year than one share in a $10 stock that increased by
4 %, for example.
Equation (12.6) is the GBM model: the "geometric" part comes because we are effectively multiplying lots of distributions together (adding them in log space). From the definition of a lognormal
random variable, if l n [ S ] is normally distributed, then S is lognormally distributed, so Equation (12.6)
is modelling St+T as a lognormal random variable. From the equation of the mean of the IognormalE
distribution in Appendix 111 you can see that St+T has a mean given by

hence p is also called the exponential growth rate, and a variance given by

GBM is very easy to reproduce in Excel, as shown by the model in Figure 12.8, even with different
time increments.
It is also very easy to estimate its parameters from a dataset when the observations have a constant
time increment between them, as shown by the model in Figure 12.9.

2

A

~

2
3
5

7
8
--

9
10
11
2
3

14
2
36
37
38
39
3
41
42

I

Mu
Sigma

4

6

B

Period
0
1
2
3
4
5
8
9
10
11
40
43
44
45
46
47
50

Return
0.027807
-0.031105
0.015708
-0.010917
-0.029635
0.037244
-0.009822
-0.008984
0.071986
0.02078
0.005866
0.03901
-0.01083
-0.010239
0.024494
0.027545

C

I

D

I

E

~

F

I

G II I H J I I

K

I

L

I

M

0.01
0.033
Prices
100
102.8197
99.67078
101.2487
100.1494
97.22498
100.9143
99.92796
99.03423
106.4262
144.1044
144.9522
150.7184
149.0949
147.5762
151.2356
155.4593

.
C7:C42
D7:D42

Formulae table
=VoseNormal((Mu-(Sigma~2)/2)*(B7-B6),Sigma*SQRT(B7-B6))
=D6"EXP(C7)

43

Figure 12.8 GBM model with unequal time increments.
I
I

I

'

Not to be confused with the simple return R,, which is the fractional increase of the variable over time t, and where r, = In[l

+ R,].

Chapter I2 Forecasting with uncertainty

B

1
2
-

I

Period

3

4
5
6
7

8

9
10
2
2

I

I

D

1
2
3
4
5
6
7
8
9
10
11

Price S
LN(S,)-LN(S,.,)
131.2897
0.063167908
139.8505
0.082367645
151.8574
0.005637288
152.7159
0.056436531
161.5825
0.021916209
165.1629
-0.048479708
157.3468
0.021069702
160.6972
-0.018525353
157.7477
-0.030621756
152.9904
0.038550398
159.0034

12
13
14
15
102
103

161.8312
168.8502
160.6408
173.5187
521.6434
542.4933

14
16
5
1
7
104
105

C

E ]

F

I

G

l

l~ime
increment

H

33 1

II

1

Innovations
0.01391
0.032387

5
1
D4:D105

-0.042458444
0.060086715
-0.007382664
0.077114246
0.034478322
0.039191541

GI0

Formulae table
=LN(C4)-LN(C3)
=AVERAGE(D4:D105)
=STDEV(D4:D105)
=G6/SQRT(G2)
=G5/G2+G9"2/2

-106

Figure 12.9 Estimating GBM model parameters with equal time increments.

A
1
2
3
4
5
6
7
8
9
10
11
12
13
185
186
187
188

-

l

B

Period
1
2
3
4
5
8
9
10
11
12
15
255
256
257

I

C

I

D

Price S
100.789
103.0675
102.8591
103.6719
99.8012
107.2738
111.2296
110.0289
114.0051
111.989
112.9895
1685.406
1637.663
1667.555

z

IEl

F

I

G

I

I

H

I

Si ma
-0.305560011
-0.610305645
-0.48660819
-1.060637884
-0.492158354
-0.132347657
-0.7206778
-0.141243519
-0.808033736
-0.949059593
-0.866893734
-0.944206129
-0.358896539

[ ~ r r osum
r

D4:D187
G5
G6
G8

1

1.094871

Formulae table
=(LN(C~)-LN(C~)-(MU-S~~~~"~/~)*(B~-B~))/(S~~~~*SQRT(B~
=ABS(AVERAGE(D4:DI87))
=ABS(STDEV(D4:Dl87)-1)
=G5+G6

Figure 12.10 Estimating GBM model parameters with unequal or missing time increments.

If there are missing observations or observations with different time increments, it is still possible
to estimate the GBM parameters. In the model in Figure 12.10, the observations are transformed to
Normal(0, 1) variables ( z } , and then Excel's Solver is used to vary mu and sigma to make the { z )
values have a mean of zero and a standard deviation of 1 by minimising the value of cell G8.
An alternative method would be to regress

- lnLSt against

&?

&? with

zero intercept: the

slope estimates p and the standard error estimates a.
The spread of possible values in a GBM increases rapidly with time. For example, the plot in
Figure 12.11 shows 50 possible forecasts with So = 1, p = 0.001 and a = 0.02.

332

Risk Analysis

4.5
4
3.5
3
2.5

P 2
1.5
1

0.5
0
0

50

100

150

200

250

300

Time T

Figure 12.11 Plot of 50 possible scenarios with a GBM(p = 0.001, a = 0.02) model with a starting value
of I .

Mean reversion, discussed next, is a modification to GBM that progressively encourages the series to
move back towards a mean the further it strays away. Jump diffusion, discussed after that, acknowledges
that there may be shocks to the variable that result in large discrete jumps. ModelRisk has functions
for fitting and projecting GBM and GBM mean reversion andor jump diffusion. The functions work
with both returns r and stock prices S .

+

12.2.2 GBM with mean reversion
The long-run time series properties of equity prices (among other variables) are, of course, of particular
interest to financial analysts. There is a strong interest in determining whether stock prices can be
characterised as random-walk or mean reverting processes because this has an important effect on an
asset's value. A stock price follows a mean reverting process if it has a tendency to return to some
average value over time, which means that investors may be able to forecast future returns better by
using information on past returns to determine the level of reversion to the long-term trend path. A
random walk has no memory, which means that any large move in a stock price following a randomwalk process is permanent and there is no tendency for the price level to return to a trend path over
time. The random-walk property also implies that the volatility of stock price can grow without bound
in the long run: increased volatility lowers a stock's value, so a reduction in volatility (Figure 12.12)
owing to mean reversion would increase a stock's value.
For a variable x following a Brownian motion random walk, we have the SDE of Equation (12.2):

For mean reversion, this equation can be modified as follows:

Chapter 12 Forecasting with uncertainty

33 3

alpha = 0.0001
0.01

..

-

.. -.

-

...
-.

0.008
0.006
0.004

-

0.002
0
-0.002
-0.004
-0.006
-0.008
-0.01

Time t

alpha = 0.1
0.01

1

0.008
0.006

-

I

0.004
0.002
0
0

-0.002
-0.004
-0.006
-0.008
-0.01

Time t

alpha = 0.4

0

I
I
i

Time t

Figure 12.12 Plots of sample GBM series with mean reversion for different values of alpha
(p = 0, c7 = 0.001).

3 34

R~skAnalysis

where a, > 0 is the speed of reversion. The effect of the dt coefficient is to produce an expectation of
moving downwards if x is currently above p, and vice versa. Mean reversion models are produced in
terms of S or r :

known as the Ornstein-Uhlenbeck process, and was one of the first models used to describe short-term
interest rates, where it is called the Vasicek model. The problem with the equation is that we can get
negative stock prices; modelling in terms of r , however,

keeps the stock price positive. Integrating this last equation over time gives
p

+ exp[-aT](r,

KT)

- p), o

1 - exp[-2aT]

which is very easy to simulate. The following plots show some typical behaviour for r,. Typical values
of a! would be in the range 0.1-0.3.
A slight modification to Equation (12.7) is called the Cox-Ingersoll-Ross or CIR model (Cox,
Ingersoll and Ross, 1985), again used for short-term interest rates, and has the useful property of not
allowing negative values (so we can use it to model the variable S ) because the volatility goes to zero
as S approaches zero:

Integrating over time, we get

where

4 w degrees of freedom and non-centrality parameter
and Y is a non-central chi-square distribution with a2
2crt exp[-aT]. This is a little harder to simulate since you need the uncommon non-central chi-square
distribution in your simulation software, but it has the attraction of being tractable (we can precisely
determine the form of the distribution for the variable St+T), which makes it easier to determine its
parameters using maximum likelihood methods.

12.2.3 GBM with jump diffusion
Jump diffusion refers to sudden shocks to the variable that occur randomly in time. The idea is to
recognise that, beyond the usual background randomness of a time series variable, there will be events
that have a much larger impact on the variable, e.g. a CEO resigns, a terrorist attack takes place, a
drug gets FDA approval. The frequency of the jumps is usually modelled as a Poisson distribution
with intensity h, so that in some time increment T there will be Poisson(hT) jumps. The jump size

Chapter 12 Forecast~ngwith uncertainty

335

for r is usually modelled as Normal(p J, a J ) for mathematical convenience and ease of estimating
the parameters. Adding jump diffusion to the discrete time Equation (12.6) for one period, we get the
following:

If we define k = Poisson(A), this reduces to

or for T periods we have

):

r~ = Normal ((p -

T

+ kp,.

Ja)

which is easy to model with Monte Carlo simulation and easy to estimate parameters for by matching
moments, although one must be careful to ensure that the A estimate isn't too high (e.g. > 0.2) because
the Poisson jumps are meant to be rare events, not form part of each period's volatility. The plot in
Figure 12.13 shows a typical jump diffusion model giving both r and S values and with jumps marked
as circles.

12.2.4 GBM with jump diffusion and mean reversion
You can imagine that, if the return r has just received a large shock, there might well be a "correction"
over time that brings it back to the expected return p of the series. Combining mean reversion with
jump diffusion will allow us to model these characteristics quite well and with few parameters. However,
the additive model of Equation (12.9) for mean and variance no longer applies, particularly when the
reversion speed is large because one needs to model when within the period the jump took place: if it
was at the beginning of the period, it may well have already strongly reverted before one observes the
value at the period's end. The most practical solution, called Euler's method, is to split up a time period
into many small increments. The number of increments will be sufficient when the model produces the
same output for decision purposes as any greater number of increments.

12.3 Autoregressive Models
An ever-increasing number of autoregressive models are being developed in the financial area. The
ones of more general interest discussed here are AR, MA, ARMA, ARCH and GARCH, and it is more
standard to apply the models to the return r rather than to the stock price S. I also give the equations for
EGARCH and APARCH. Let me just repeat my earlier warning that, before being convinced that some
subtle variation of the model gives a genuine advantage, try generating a few samples for simpler models
that you have fit to the data and see whether they can create scenarios of a similar pattern. ModelRisk
offers functions that fit each of these series to data and produce forecasts. The data can be live linked
to historical values, which is very convenient for keeping your model automatically up to date.

3 36

Risk Analysis

0.4

0.3
0.2
0.1

rltl

0

-0.1

-0.2

-0.3
-0.4

Time t

Figure 12.13 Sample of a GBM with jump diffusion with parameters p
and h = 0.02.

1.1

0,a

= 0.01,W J

= 0.Q4, UJ = 0.2

12.3.1 AR
The equation for an autoregressive process of order p, or AR(p), is

where

E~

are independent Normal(0, a) random variables. Some constraints on the parameters {a,}are

needed if one wants to keep the model stationary (meaning the marginal distribution of r i s the same for
all I ) , e.g, for an AR(P), lal 1 -= 1. In most situations, an AR(1) or AR(2) is sufficiently elaborate, i.e:

Chapter 1 2 Forecasting with uncertainty

337

You can see that this is just a regression model where rt is the dependent variable and rt-i are the
explanatory variables. It is usual, though not essential, that ai > ai+l, i.e. that r, is explained more by
more recent values ( t - 1, t - 2, . . .) rather than by older values ( t - 10, t - 11, . . .).

The equation for a moving-average process of order q , or M A ( q ) , is

This says that the variable r, is normally distributed about a mean equal to

where E , are independent Normal(0, c) random variables again. In other words, the mean of r, is the
mean of the process as a whole p plus some weighting of the variation of q previous terms from the
mean. Similarly to A R models, it is usual that bi > bi+l,i.e. that rt is explained more by more recent
terms (t - 1, t - 2 , . . .) rather than by older terms ( t - 10, t - 11, . . .).

12.3.3 A R M A
We can put the A R ( p ) and M A ( q ) processes together to create an autoregressive, moving-average model
A R M A ( p , q ) process with mean p that is described by the following equation:

In practice, the A R M A ( 1 , 1) is usually sufficiently complex, so the equation simplifies to

ARCH models were originally developed to account for fat tails by allowing clustering of periods of
volatility (heteroscedastic, or heteroskedastic, means "having different variances"). One of the assumptions in regression models that were previously used for analysis of high-frequency financial data was
that the error terms have a constant variance. Engle (1982), who won the 2003 Nobel Memorial Prize

338

R~skAnalysis

for Economics, introduced the ARCH model, applying it to quarterly UK inflation data. ARCH was
later generalised to GARCH by Bollerslev (1986), which has proven more successful in fitting to financial data. Let rt denote the returns or return residuals and assume that rt = p atzt, where zt are
independent, Normal(0,l) distributed, and the CT
is:
modelled by

+

where w > 0, ai 2 0, i = 1, . . . , q and at least one a; > 0. Then r, is said to follow an autoregressive
conditional heteroskedastic, ARCH(q), process with mean p. It models the variance of the current error
term as a function of the variance of previous error terms (r,-l - p). Since each ai > 0, it has the effect
of grouping low (or high) volatilities together.
If an autoregressive moving-average process (ARMA process) is assumed for the variance, then r, is
said to be a generalised autoregressive conditional heteroskedastic GARCH(p, q) process with mean g :

where p is the order of GARCH terms and q is the order of ARCH terms, w > 0, a;
bj 2 0, j = 1 , . . . , p and at lease one ai o r b , > 0.
In practice, the model most generally used is a GARCH(1, 1):

> 0, i = I , . . . , q;

1 2.3.5 APARCH
The asymmetric power autoregressive conditional heteroskedasticity, APARCH(p, q), model was introduced by Ding, Granger and Engle (1993) and is defined as follows:

where -1 < yi < 1 and at least one a; or b j > 0. 6 plays the role of a Box-Cox transformation
of the conditional standard deviation q ,while yi reflect the so-called leverage effect. APARCH has
proved very promising and is now quite widespread because it nests several other models as special cases, e.g. the ARCH(6 = 1, y; = 0, bi = O), GARCH(8 = 2, y; = 0), (TS-GARCH(6 = 1, y; = O),
GJR-GARCH(6 = 2), TARCH(6 = 1) and NARCH(bi = 1, y; = 0).
In practice, the model most generally used is an APARCH(1, 1):

Chapter 12 Forecasting with uncertainty

339

12.3.6 EGARCH
The exponential general autoregressive conditional heteroskedastic, EGARCH(p, q), model was another
form of GARCH model with the purpose of allowing negative values in the linear error variance equation.
The GARCH model imposes non-negative constraints on the parameters, a; and b j , while there are
no such restrictions on these parameters in the EGARCH model. In the EGARCH(p, q) model, the
conditional variance, ,
:
a is formulated by an asymmetric function of lagged disturbances rt:

where

and

when zl is a standard normal variable.
Again, in practice the model most generally used has p = q = 1, i.e. is an EGARCH(1, 1):

12.4 Markov Chain Models
~ a r k o chains
v ~ comprise a number of individuals who begin in certain allowed states of the system and
who may or may not randomly change (transition) into other allowed states over time. A Markov chain
has no memory, meaning that the joint distribution of how many individuals will be in each allowed
state depends only on how many were in each state the moment before, not on the pathways that led
there. This lack of memory is known as the Markov property. Markov chains come in two flavours:
continuous time and discrete time. We will look at a discrete-time process first because it is the easiest
to model.

12.4.1 Discrete-time Markov chain
In a discrete-time Markov process the individuals can move between states only at set (usually equally
spaced) intervals of time. Consider a set of 100 individuals in the following four marital states:

43 are single;
29 are married;
11 are separated;
17 are divorced.
Named after Andrey Markov (1 856- 1922), a Russian mathematician.

340

R~skAnalysis

We write this as a vector:

Given sufficient time (let's say a year) there is a reasonable probability that the individuals can change
state. We can construct a matrix of the transition probabilities as follows:
Is now:
Transition matrix

was:

Married
Separated
Divorced 1

Single

Married

0
0
0

0.88
0.13
0.09

Se~arated Divorced
0.08
0.45
0.02

0.89

We read this matrix row by row. For example, it says (first row) that a single person has an 85 % chance
of still being single 1year later, a 12 % chance of being married, a 2 % chance of being separated
and a 1 % chance of being divorced. Since these are the only allowed states (e.g. we haven't included
"engaged" so that must be rolled up into "single"), the probabilities must sum to 100 %. Of course,
we'd have to decide what a death would mean: the transition matrix could either be defined such that
if a person dies they retain their marital status for this model, or we could make this a transition matrix
conditional on them surviving a year.
Notice that the "single" column is all Os, except the singlelsingle cell, because, once one is married,
the only states allowed after that are married, separated and divorced. Also note that one can go directly
from single to separated or divorced, which implies that during that year the individual had passed
through the married state. Markov chain transition matrices describe the probability that one is in a state
at some precise time, given some state at a previous time, and is not concerned with how one got there,
i.e. all the other states one might have passed through.
We now have the two elements of the model, the initial state vector and the transition matrix, to
estimate how many individuals will be in each state after a year. Let's go through an example calculation
to estimate how many people will be married in one year:
a
a
a

for
for
for
for

the
the
the
the

single people, Binomial(43, 0.12) will be married;
married people, Binomial(29, 0.88) will be married;
separated people, Binomial(l1, 0.13) will be married;
divorced people, Binomial(l7, 0.09) will be married.

Add together these four binomial distributions and we get an estimate of the number of people from our
group who will be married next year. However, the above calculation does not work when we want to
look at the joint distribution of how many people will be in each state: clearly we cannot add four sets
of four binomial distributions because the total must sum to 100 people. Instead, we need to use the
multinomial distribution. The number of people who were single but are now {Single, Married, Separated,
Divorced) equals Multinomial(43, (0.85, 0.12, 0.02, 0.01)). Applying the multinomial distribution for
the other three initial states, we can take a random sample from each multinomial and add up how many
are in each state, as shown in the model in Figure 12.14.

Chapter 12 Forecast~ngwith uncertainty

341

Figure 12.14 Multinomial method of performing a Markov chain model.

Let's now look at extending the model to predict further ahead in time, say 5 years. If we can assume
that the probability transition matrix remains valid for that period, and that nobody in our group dies,
we could repeat the above exercise 5 times - calculating in each year how many individuals are in each
state and using that as the input into the next year, etc. However, there is a more efficient method.
The probability a person starting in state i is in state j after 2years is determined by looking at the
probability of the person going from state i to each state after 1 year, and then going from that state to
state j in the second year. So, for example, the probability of changing from single to divorced after
2 years is
P(Sing1e to Single) * P (Single to Divorced)

+P (Single to Married) * P (Married to Divorced)
+P (Single to Separated) * P (Separated to Divorced)
+P (Single to Divorced) * P (Divorced to Divorced)
Notice how we have multiplied the elements in the first row (single) by the elements in the last column
(divorced) and added them. This is the operation performed in matrix multiplication. We can therefore
determine the probability transition matrix over the 2year period by simply multiplying the 1 year
transition matrix by itself (using Excel's MMULT function) in the model in Figure 12.15.
When one wants to forecast T periods in advance, where T is large, performing the matrix multiplication (T - 1) times can become rather tedious, but there is some mathematics based on transforming the
matrix that allows one directly to determine the transition matrix over any number of periods. ModelRisk
provides some efficient means to do this: the VoseMarkovMatrix function calculates the transition matrix
for any time length, and the VoseMarkovSample goes the next step, simulating how many individuals
are in each final state after some period. In this next example (Figure 12.16) we calculate the transition
matrix and simulate how many individuals will be in each state after 25 years.
Notice how after 25 years the probability of being married is about 45 %, irrespective of what state
one started in: a similar situation occurs for separated and divorced. This stabilising property is very
common and, as a matter of interest, is the basis of a statistical technique discussed briefly elsewhere
in this book called Markov chain Monte Carlo. Of course, the above calculation does assume that the
transition matrix for 1 year is valid to apply over such a long period (a big assumption in this case).

342

Risk Analysis

One year transition
Sin le

IS now:
Married Se arated Divorced

Single
Was: Married
Separated
Divorced

0.88
0.13

0.08
0.45

0.04
0.42

Two year transition
IS now:
matrix
Sin le
Married Se arated Divorced
0.7225 0.211 1
Single
0.0358
0.0306
Was: Married
0.2107
0.2213
0.568
Separated
Divorced

,

14

28

Totals

Figure 12.15

A
1
2
3
4

1

26

37

Multinomial method of performing a Markov chain model with time an integer > 1 unit.

B

I C I

Number in
initial state

D

I

E

17

One year transition
matrix
Single
Married
Was:
Separated
Divorced

# periods

matrix

5
6
7

Number in final state
Married Se arated Divorced

Sin le

I

F

I

G

1

H

I

I

I J I

K

IL

IS now:

0.13

0.45

0.42

8

12
13
14
15

Was:

Married

Number in
final state
0.0000

0.4460

0.0821

0.4719
49

2

1
7
2
19
20
21

Formulae table
Input data
(F11:114)
K l 1:K14 (outputs)

Figure 12.16

B4:B7, F4:17, 81 1
{=VoseMarkovMatrix(F4:17,BlI ) }
~=VoseMarkovSample(B4:B7,F4:I7,B11)}

ModelRisk methods for performing a Markov chain model with time an integer > 1 unit.

12.4.2 Continuous-time Markov chain
For a continuous-time Markov process we need to be able to produce the transition matrix for any
positive time increment, not just an integer multiple of the time that applies to the base transition

Chapter 12 Forecasting with uncertainty

343

matrix. So, for example, we might have the above marital status transition matrix for a single year but
wish to know what the matrix is for half a year, or 2.5 years.
There is a mathematical technique for finding the required matrix, based on converting the multinomial
probabilities in the matrix into Poisson intensities that match the required probability. The mathematical
manipulation is somewhat complex, particularly when one has to wrestle with numerical stability. The
ModelRisk functions VoseMarkovMatrix and VoseMarkovSarnple detect when you are using non-integer
time and automatically convert to the alternative mathematics. So, for example, we can have the model
described above for a half-year.

12.5 Birth and Death Models
There are two strongly related probabilistic time series models called the Yule (or pure birth) and pure
death models. We have certainly found them very useful in modelling numbers in a bacterial population,
but they could be helpful in modelling other variables, modelling numbers of individuals that increase
or decrease according to their population size.

12.5.1 Yule growth model
This is a pure birth growth model and is a stochastic analogue to the deterministic exponential growth
models one often sees in, for example, microbial risk analysis. In exponential growth models, the rate
of growth of a population of n individuals is proportional to the size of the population:

where B is the mean rate of growth per unit time t. This gives the number of individuals nt in the
population after time t as
n, = noexp(Bt1
where no is the initial population size. The model is limited because it takes no account of any randomness in the growth. It also takes no account of the discrete nature of the population, which is important
at low values of n. Moreover, there are no defensible statistical tests to apply to fit an exponential
growth curve to observations (regression is often used as a surrogate) because an exponential growth
model is not probabilistic, so no probabilistic (i.e. statistical) interpretation of data is possible.
The Yule model starts with the premise that individuals have offspring on their own (e.g. by division),
that they procreate independently, that procreating is a Poisson process in time and that all individuals
in the population are the same. The expected number of offspring from an individual per unit time (over
some infinitesimal time increment) is defined as /3. This leads to the results that an individual will have,
1.
after time t, Geometric(exp(-Bt)) offspring, giving a new total population of Geometric(exp(-Bt))
Thus, if we start with no individuals, then by some later time t we will have

+

from the relationship
S

NegBin(s, p) =

Geometric(p)
i=l

with mean Ti, = noep< corresponding to the exponential growth model.

344

Risk Analysis

A possible problem in implementing this type of model is that no and n, can be very large, and
simulation programs tend to produce errors for discrete distributions like the negative binomial for large
input parameters and output values. ModelRisk has two time series functions to model the Yule process
that work for all input values:

which generates values for n,, and

VoseTimeSeriesYulelO(Log,ono,Loglncrease, t )
which generates values for Loglo(n,), as one often finds it more convenient to deal with logs for
exponentially growing populations because of the large numbers that can be generated. Loglncrease is
the number of logs (in base 10) by which one expects the population to increase per time unit. The
parameters /Iand Loglncrease are related by
Log Increase = Loglo[exp(j3)]

12.5.2 Death model
The pure death model is a stochastic analogue to the deterministic exponential death models one often
sees in, for example, microbial risk analysis. lndividuals are assumed to die independently and randomly
in time, following a Poisson process. Thus, the time until death can be described by an exponential
distribution. which has a cdf:

where h is the expected instantaneous death of an individual. The probability that an individual is still
alive at time t is therefore

Thus, if no is the initial population, the number n, surviving until time t follows a binomial distribution:

which has a mean of

i.e. the same as the exponential death model. The cdf for the time until extinction t~ of the population
is given by

The binomial death model offered here is an improvement over the exponential death model for
several reasons:
The exponential death model takes no account of any randomness in the growth, so cannot interpret
variations from an exponential line fit.

Chapter I2 Forecasting w ~ t huncertainty

345

The exponential death model takes no account of the discrete nature of the population, which is
important at low values of n.
There are no defensible statistical tests to apply to fit an exponential growth curve to observations
(regression is often used as a surrogate) because an exponential model is not probabilistic, so there
can be no probabilistic interpretation of data. A likelihood function is possible, however, for the
death model described here.
A possible difficulty in implementing this death model is that no and n, can be very large, and
simulation programs tend to produce errors for discrete distributions like the binomial for large input
parameters and output values. ModelRisk has two time series functions to model the death model that
eliminate this problem:

which generates values for n,, and

VoseTimeSeriesDeathlO(Loglono,
LogDecrease, t)
which generates values for Loglo(nt),as one often finds it more convenient to deal with logs for bacterial
populations (for example) because of the large numbers that can be involved. The LogDecrease parameter
is the number of logs (in base 10) that one expects the population to decrease by per time unit. The
parameters h and LogDecrease are related by
LogDecrease = hLoglo(e)

12.6 Time Series Projection of Events Occurring
Randomly in Time
Many things we are concerned about occur randomly in time: people arriving at a queue (customers,
emergency patients, telephone calls into a centre, etc.), accidents, natural disasters, shocks to a market,
terrorist attacks, particles passing through a bubble chamber (a physics experiment), etc. Naturally, we
may want to model these over time, perhaps to figure out whether we will have enough stock vaccine,
storage space, etc. The natural contender for modelling random events is the Poisson distribution - see
Section 8.3 which returns the number of random events occurring in time t when h events are expected
per unit time within t . Often we might think that the expected number of events may increase or decrease
over time, so we make h a function of t as shown by the model in Figure 12.17.
A variation of this model is to take account of seasonality by multiplying the expected number of
events by seasonal indices (which should average to 1).
In Section 8.3.7 I have discussed the P6lya and Delaporte distributions which are counting distributions
similar to the Poisson but which allow h to be a random variable too. The P6lya is particularly helpful
because, with one extra parameter, h , we can add some volatility to the expected number of events, as
shown by the model in Figure 12.18.
Notice the much greater peaks in the plot for this model compared with that of the previous model in
Figure 12.17. Mixing a Poisson with a gamma distribution to create the P6lya is a helpful tool because
we can get the likelihood function directly from the probability mass function (pmf) of the P6lya and
therefore fit to historical data. If the MLE value for h is very small, then the Poisson model will be as
good a fit and has one less parameter to estimate, so the P6lya model is a useful first test.

C6:C55
D6:D55

Formulae table
=Gradient'B6+lntercept
=VosePoisson(C6)

Figure 12.18 A Polya time series with expected intensity A as a linear function of time and a coefficient of
variation of h = 0.3.

Chapter 12 Forecasting with uncertainty

347

The linear equation used in the above two models for giving an approximate description of the
relationship of the expected number with time is often quite convenient, but one needs to be careful
because a negative slope will ultimately produce a negative expected value, which is clearly nonsensical
(which is why it is good practice to plot the expected value together with the modelled counts as shown
in the two figures above). The more correct Poisson regression model considers the log of the expected
value of the number of counts to be a linear function of time, i.e.

where Po and PI are regression parameters. The ln(e) term in Equation (12.10) is included for data
where the amount of exposure e varies between observations; for example, if we were analysing data to
determine the annual increase in burglaries across a country where our data are given for different parts of
the country with different population levels, or where the population size is changing significantly (so the
exposure measure e would be person-years). Where e is constant, we can simplify Equation (12.10) to

The model in Figure 12.19 fits a P6lya regression to data (year <= 0) and projects out the next 3 years
on annual sports accidents where the population is considered constant so we can use Equation (12.11).

Figure 12.19 A Polya regression model fitted to data and projected 3years into the future. The LogL
variable is optimised using Excel's Solver with the constraint that h > 0. ModelRisk offers Poisson and Polya
regression fits for multiple explanatory variables and variable exposure levels.

C

348

Risk Analysis

12.7 Time Series Models with Leading Indicators
Leading indicators are variables whose movement has some relationship to the movement of the variable
you are actually interested in. The leading indicator may move in the same or opposite direction as the
variable of interest, as shown in Figure 12.20.
In order to evaluate the leading indicator relationship, you will have to determine:
the causal relationship;
the quantitative nature of the relationship.
The causal relationship is critical. It gives a plausible argument for why the movement in the leading
indicator should in some way presage the movement of the variable of interest. It will be very easy to
find apparent leading indicator patterns if you try out enough variables, but, if you can't logically argue
why there should be any relationship (preferably make the argument before you do the analysis on the
potential indicator variable, it's much easier to convince yourself of a causal argument when you've
seen a temptingly strong statistical correlation), it's likely that the observed relationship is spurious.
The quantitative nature of the relationship should come from a mixture of analysis of historic data and
practical thinking. Some leading indicators will have a cumulative effect over time (e.g. rainfall as an
indicator of the water available for use at a hydroelectric plant) and so need to be summed or averaged.
Other leading indicators may have a shorter response time to the same, perhaps unmeasurable, causal
variable as the variable in which you are interested (if the causal variable was measurable, you would use
that as the leading indicator instead), and so your variable may exhibit the same pattern with a time lag.
The analysis of historic data to determine the leading indicator relationship will depend largely on
the type of causal relationship. Linear regression is one possible method, where one regresses historic
values of the variable of interest against the lead indicator values, with either a specific lag time if
that can be causally deduced or with a varying lag time to produce the greatest r-squared fit if one is
estimating the lag time. Note that any forecast can only be made a distance into the future equal to the
lag time: otherwise one needs to make a forecast of the lead indicator too.
The model in Figure 12.21 provides a fairly simple example in which the historic data (used to create
the left pane of Figure 12.20 below) of the variable of interest Y are compared visually with lead
indicator X data for different lag periods. The closest pattern match occurs for a lag 6 of 11 periods
(Figure 12.22).

-100

-80

-60

-40

Tlme

-20

0
-Leading

indicator

-100

-Variable

of lnterest

-80

6 0

-40

Time
I

2 0

0

- Leading indicator
- Variable of lnterest

1

Figure 12.20 Lead indicator patterns: left - lead indicator variable is positively correlated with variable of
interest; right - negatively correlated.

C h a p t e r 12 F o r e c a s t ~ n gwith u n c e r t a i n t y

......

Vanable of InterestY

-Y offset 11 periods

78

349

1

160

1

2 1 4 0 ~

.E
$120

~

J 100
1

80

-1 00

-80

-60

-40

2 0

0

Time
R-squared 1 0.971492
Slope lm) 1 0.045557
intercept (=)I -0.017818
S t e n isyx) l 0.163501

F~rmulaelable

=SLOPE($E$5.$E$83.$C$5$C583)
=INTERCEPT($E$5'$E$63,$C55:SC$83)
=STEYX~$E$5:$ED3.$C$5:$C$831

Figure 12.21

Leading indicator fit and projection model.

80
-80

-60

-40
Time

0

-20

Leading indicator X

-

Y offset 10 periods

Figure 12.22 Overlay of variable of interest and lead indicator variable lagged by 10, 11 and 12 periods,
showing the closest pattern correlation at 11 periods.

350

Risk Analys~s

80 4
-1 00

-1 00

L3
-80

-80

-60

-60

-40
Time

-20

-40
Time

-20

0
Leading indicator X
Y offset 11 periods

-

-

0

-Leading indicator X
-Y

offset 12 periods

Figure 12.22 Continued.

A scatter plot of Y (t) against X(t - 11) shows a strong linear relationship, so a least-squares regression
seems appropriate (Figure 12.23).
The regression parameters are:

slope = 0.04555
intercept = -0.01782
SteYX = 0.1635
(We could use the linear regression parametric bootstrap to give us uncertainty about these parameters
if we wished.)
The resultant model is then

which we can use to predict {Y (1) . . . Y (1 1)):

C h a ~ t e r12 Forecast~ngwith uncertainty

3

J

80

90

100

110

120

130

140

150

IB

ijo

Lead indicator X(t-11)
~

- - -

35 1

I
J

Figure 12.23 Scatter plot of variable of interest observations against lead indicator observations lagged by
11 periods.

12.8 Comparing Forecasting Fits for Different Models
There are three components to evaluating the relative merits of the various forecasting models fitted to data. The first is to take an honest look at the data you are going to fit: do they come from
a world that you think is similar to the one you are forecasting into? If not (e.g. there are fewer
companies in the market now, there are stricter controls, the product for which you are forecasting
sales is getting rather old and uninteresting, etc.), then consider some of the forecasting techniques I
describe in Chapter 17 which are based more on intuition than mathematics and statistics. The second
step is also common sense: ask yourself whether the assumptions behind the model could actually
be true and why that might be. Perhaps you can investigate whether this type of model has been
used successfully for similar variables (e.g. a different exchange rate, interest rate, share price, water
levels, hurricane frequencies than the one you are modelling). In fact, I recommend that you use
this as a first step in selecting which models might be appropriate for the variable you are modelling.
Then you will need statistically to evaluate the degree to which each model fits the data and to
compensate for the fact that a model with more parameters will have greater flexibility to fit the data
but may not mean anything. Statistical techniques for model selection and comparison have improved,
and the best methods now use ''information criteria" of which there are three in common usage, described
at the end of Section 10.3.4. The main advantage over the older log-likelihood ratio method is that the
models don't have to be nested - meaning that each tested model does not need to be a simplified
(some parameters removed) version of a more complex model. For ARCH, GARCH, APARCH and
EGARCH you should subtract n(1 ln[2a]), where n is the number of data points, from each of the
criteria. If you fit a number of models to your data, try not to pick automatically the model with the
best statistical result, particularly if the top two or three are close. Also, simulate projections out into
the future and see whether the range and behaviour correspond to what you think is realistic (you
can do this automatically in the time series fitting window in ModelRisk, overlaying any number of
paths).

+

352

Risk Analysis

1 2.9 Long-Term Forecasting
By long-term forecasts I mean making projections out into the future that span more than, say, 20-30 %
of your historical experience. I am not a big believer in using very technical models in these situations.
For a start, there should be a lot of uncertainty to the projections, but more importantly the world is
ever-changing and the key assumption you implicitly make by producing a forecast with a model fitted
to historic data is that the world will carry on behaving in the same way. I know that historically I
have been hopeless at predicting what my life will be like in 5 years time: in 1985 I fully expected to
be a physical oceanographer in the UK; in 1987 I'd become a qualified photographer living in New
Zealand, etc. I'd fixed on being a risk analyst by 1988, but then moved to the UK, Ireland, France and
Five
. ~ years ago I had no idea that our company would have grown in the way it has,
now ~ e l ~ i u m
or that we would have developed such a strong software capability. Try applying the same test to the
world you are attempting to model.
The alternative is to combine lessons learned from the past (e.g. how sensitive your sales are to the US
economy) with a good look around to see how the world is changing (mergers coming up, wars starting
or ending, new technology, etc.) and draw up scenarios of what the world might look like and how it
would affect the variables you want to forecast. I give a number of techniques for this in Chapter 14.

Now I have three kids, a partner, a nice home, a dog and an estate car, so maybe things are settling down.

Chapter

Modelling correlation and
dependencies
13.1 Introduction
In previous chapters we have looked at building a risk analysis model and assigning distributions to
various components of the model. We have also seen how risk analysis models are more complex than
the deterministic models they are expanding upon. The chief reason for this increase in complexity is
that a risk analysis model is dynamic. In most cases there are a potentially infinite number of possible
combinations of scenarios that can be generated for a risk analysis model. We have seen in Chapter 4
that a golden rule of risk analysis is that each one of these scenarios must be potentially observable
in real life. The model, therefore, must be restricted to prevent it from producing, in any iteration, a
scenario that could not physically occur.
One of the restrictions we must place on our model is to recognise any interdependencies between
its uncertain components. For example, we may have both next year's interest rate and next year's
mortgage rate represented as distributions. Figure 13.1 gives an example of two distributions modelling
these interest rate and mortgage rate predictions. Clearly, these two components are strongly positively
correlated, i.e. if the interest rate turns out to be at the high end of the distribution, the mortgage rate
should show a correspondingly high value. If we neglect to model the interdependency between these
two components, the joint probabilities of the various combinations of these two parameters will be
incorrect. Impossible combinations will also be generated. For example, a value for the interest rate of
6.5 % could occur with a value for the mortgage rate of 5.5 %.
There are three reasons why we might observe a correlation between observed data. The first is
that there is a logical relationship between the two (or more) variables. For example, the interest rate
statistically determines the mortgage rate, as discussed above. The second is that there is another external
factor that is affecting both variables. For example, the weather during construction of a building will
affect how long it takes both to excavate the site and to construct the foundations. The third reason is
that the observed correlation has occurred purely by chance and no correlation actually exists. Chapter 6
outlines some statistical confidence tests to help determine whether the observed correlations are real.
However, there are many examples of strong correlation between variables that would pass any tests
of significance but where there is no relationship between the variables. For example, the number of
personal computer users in the UK over the last 8 years and the population of Asia will probably be
strongly correlated - not because there is any relationship but because both have steadily increased over
that period.

Figure 13.1 Distributions of interest and mortgage rate predictions.

1 3.1.1 Explanation of dependency, correlation and regression
The terms dependency, correlation, and regression are often used interchangeably, causing some confusion, but they have quite specific meanings. A dependency relationship in risk analysis modelling is
where the sampled value from one variable (called the independent) has a statistical relationship that
approximately determines the value that will be generated for the other variable (called the dependent).
A statistical relationship has an underlying or average relationship between the variables around which
the individual observations will be scattered. Its chief difference to correlation is that it presumes a causal
relationship. As an example, the interest rate and mortgage rate will be highly correlated. Moreover, the
mortgage rate will be in essence dependent on the interest rate, but not the other way round.
Correlation is a statistic used to describe the degree to which one variable is related to another.
Pearson's correlation coefficient (also known as Pearson's product moment correlation coefficient) is
given by

where Cov(X, Y ) is the covariance between datasets X and Y, and a ( X ) and a ( Y ) are the sample
standard deviations as defined in Chapter 6. Correlation can be considered to be a normalised covariance
between the two datasets: dividing by the standard deviation of each dataset produces a unitless index
between - 1 and + I . The correlation coefficient is frequently used alongside a regression analysis to
measure how well the regression line explains the observed variations of the dependent variable. The
above correlation statistic is not to be confused with Spearman's rank order correlation coefficient which
provides an alternative, non-parametric approach to measuring the correlation between two variables.
A little care is needed in interpreting covariance. Independent variables are always uncorrelated, but
uncorrelated variables are not always independent. A classic, if somewhat theoretical, example is to
consider the variables X = Uniform(-1, 1) and Y = x 2 . There is a direct link between X and Y, but they
have zero covariance since Cov(X, Y) = E[XY] - E [ x ] E [ Y ] ' (the definition) = E [ x ~ E] [ x ] E [ x ~ ] ,
and both E [ X ] and E [ x 3 ]= 0. This is one reason we look at scatter plots of data as well as calculating
correlation statistics.

' E[] denotes the expectation, i.e. the mean of all values weighted by their probability

Regression is a mathematical technique used to determine the equation that relates the independent
and dependent variables with the least margin of error. If we were to plot a scatter plot of the available
data, this equation would be represented by a line that passed as close as possible through the data points
(see Figure 13.2). The most common technique is that of simple least-squares linear regression. This
objectively determines the straight line (Y = ax b) such that the sum of the squares of the vertical
deviations of the data points from the line is a minimum. The assumptions, mathematics and statistics
relating to least-squares linear regression are provided in Section 6.3.9.

+

13.1.2 General comments on dependency modelling
The remainder of this chapter offers several techniques for modelling correlation and dependencies
between uncertain components, with examples of where and how they are used. The sections on rank
order correlation and copulas provide techniques for modelling correlation. The other sections offer
techniques for dependency modelling. The analyst will need to determine whether it is important to
focus on any particular correlation or dependency structure in the model. A simple way to determine
this is to run two simulations, one with a zero rank order correlation and one with a +1 or - 1 correlation,
using two approximate distributions to define the correlated pair. If the model's results from these two
simulations are significantly different, the correlation is obviously an important component of the general
model.
Scatter plots are an extremely useful way of visualising the form of a correlation or dependency.
The common practice is to plot observed data for the independent (when known) variable on the x axis

I

356

Risk Analysis

m
.-E
0

8

3 2
m
d
.

9
Fisherman's prediction of weight of fish

I

Experience

P!

gi
I

Advertising expenditure

Figure 13.3

Examples of dependency patterns.

and corresponding data for the dependent (again, when known) variable on the y axis. Figure 13.3
illustrates four dependency patterns that you may meet: top left - positive linear; top right - negative
linear; bottom left - positive curvilinear; and bottom right - mixed curvilinear.
Scatter plots also provide an excellent way of previewing a correlation pattern that you have defined
in your own models. Most risk analysis packages allow the user to export the Monte Carlo generated
values for any component in your model to the Windows clipboard or directly into a spreadsheet. The
data can then be plotted in a scatter plot using the standard spreadsheet-charting facilities. The number of
iterations (and therefore the number of generated data points) should be set to a value that will produce
a scatter plot that fills out the low-probability areas reasonably well while avoiding overpopulation of
the high-probability areas. High-resolution screens now make it reasonable to plot around 3000 data
points as little dots that will show the pattern and give an impression of density quite nicely.

13.2 Rank Order Correlation
Most risk analysis software products now offer a facility to correlate probability distribution within a
risk analysis model using rank order correlation. The technique is very simple to use, requiring only
that the analyst nominates the two distributions that are to be correlated and a correlation value between
- 1 and +1. This coefficient is known as the Spearman's Rank Order Correlation CoefJicient:
A correlation value of -1 forces the two probability distributions to be exactly negatively correlated,
i.e. the X percentile value in one distribution will appear in the same iteration as the 100 - X
percentile value of the other distribution.

Chapter 13 Modelling correlat~onand dependenc~es 3 5 7

A correlation value of +l forces the two probability distributions to be exactly positively correlated,
i.e. the X percentile value in one distribution will appear in the same iteration as the X percentile
value of the other distribution. In practice, one rarely uses correlation values of -1 and +l.
Negative correlation values between 0 and -1 produce varying degrees of inverse correlation, i.e.
a low value from one distribution will correspond to a high value in the other distribution, and
vice versa. The closer the correlation to zero, the looser will be the relationship between the two
distributions.
Positive correlation values between 0 and +1 produce varying degrees of positive correlation, i.e. a
low value from one distribution will correspond to a low value in the other distribution and a high
value from one distribution will correspond to a high value from the other.
A correlation value of 0 means that there is no relationship between the two distributions.

13.2.1 How rank order correlation works
The rank order correlation coefficient uses the ranking of the data, i.e. what position (rank) the data
point takes in an ordered list from the minimum to maximum values, rather than the actual data values
themselves. It is therefore independent of the distribution shapes of the datasets and allows the integrity
of the input distributions to be maintained. Spearman's p is calculated as

where n is the number of data pairs and AR is the difference in the ranks between data values in
the same pair. This is in fact a short-cut formula where there are few or no ties: the exact formula is
discussed in Section 6.3.10.
Example 13.1

The spreadsheet in Figure 13.4 calculates the Spearman's p for a small dataset. This correlation coefficient is symmetric about the distributions being correlated, i.e. only the difference between ranks

AI
1
2
3
4
5
6
7
8
9
10
22
23

-

B

I c

I D I E I

F

]GI

Variable Variable Rank Rank Difference
A value B value of A of B in ranksA2
90.86
77.57
4
9
25
110.89 95.04
18
17
1
4
66.35
2
4
86.84
92.24
71.1 1
5
6
1
95.88
75.90
7
8
1
15
16
19
115.14 89.06
1
1
0
83.53
51.16
96.96
87.34
8
14
36
3
2
1
87.88
59.84

H

I

I

I

Number of data pairs:
Rank order correlation :

=COUNT(B4:823)

24

Figure 13.4 An example of the calculation of Spearman's rank order correlation coefficient.

IK

J

20
0.72

358

Risk Analys~s

is important and not whether distribution A is being correlated with distribution B or the other way
round. +
In order to apply rank order correlation to a pair of probability distributions, risk analysis software
has to go through several steps. Firstly, a number of rank scores equivalent to the number of iterations
is generated for each distribution that is to be correlated. Secondly, these rank score lists are jumbled
up so that the specified correlation is achieved between correlated pairs. Thirdly, the same number of
samples are drawn from each distribution and sorted from minimum to maximum. Finally, these values
are used during the simulation: the first to be used has the same ranking in the list as the first value in
its rank score list, and so on, until all rank scores and all generated values have been used.

13.2.2 Use, advantages and disadvantages of rank order correlation
Rank order correlation provides a very quick and easy to use method of modelling correlation between
probability distributions. The technique is "distribution independent", i.e. it has no effect on the shape
of the correlated distributions. One is therefore guaranteed that the distributions used to model the
correlated variables will still be replicated.
The primary disadvantage of rank order correlation is the difficulty in selecting the appropriate correlation coefficient. If one is simply seeking to reproduce a correlation that has been observed in previous
data, the correlation coefficient can be calculated directly from the data using the formula in the previous
section. The difficulty appears when attempting to model an expert's opinion of the degree of correlation
between distributions. A rank order correlation lacks intuitive appeal, and it is therefore very difficult
for experts to decide which level of correlation best represents their opinion.
This difficulty is compounded by the fact that the same degree of correlation will look quite different
on a scatter plot for different distribution types, e.g. two lognormals with a 0.7 correlation will produce
a different scatter pattern to two uniform distributions with the same correlation. Determining the
appropriate correlation coefficient is more difficult still if the two distributions do not share the same
geometry, e.g. one is normal and the other uniform, or one is a negatively skewed triangle and the
other a positively skewed triangle. In such cases, the scatter plot will often show quite surprising results
(Figure 13.5 illustrates some examples).
Figure 13.6 shows that correlation only becomes visually evident at levels of about 0.5 or above
(or about -0.5 or below for negative correlation). Producing scatter plots like this at various levels of
correlation for two variables can help subject matter experts provide estimates of levels of correlation
to be applied.
Another disadvantage of rank order correlation is that it ignores any causal relationship between the
two distributions. It is usually more logical to think of a dependency relationship along the lines of that
described in Sections 13.4 and 13.5.
A further disadvantage of which most people are unaware is that an assumption of the correlation shape
has already been built into the simulation software. The programming technique was originally developed
in a seminal paper by Iman and Connover (1982) who used an intermediate step of translating the random
numbers through van der Waerden scores. Iman and Conover found that these scores produced "naturallooking" correlations: variables correlated using van der Waerden scores produced elliptical-shaped
scatter plots, while using the ranking of the variables directly produced scatter patterns that were pinched
in the middle and fanned out at each end. For example, correlating two Uniform(0, 1) distributions
together (the same as plotting the cdfs of any two continuous rank order correlated distributions) produces
the patterns in Figure 13.7.

Chapter 13 Modelling correlation and dependencies

357

A correlation value of +l forces the two probability distributions to be exactly positively correlated,
i.e. the X percentile value in one distribution will appear in the same iteration as the X percentile
value of the other distribution. In practice, one rarely uses correlation values of -1 and +l.
Negative correlation values between 0 and -1 produce varying degrees of inverse correlation, i.e.
a low value from one distribution will correspond to a high value in the other distribution, and
vice versa. The closer the correlation to zero, the looser will be the relationship between the two
distributions.
Positive correlation values between 0 and +1 produce varying degrees of positive correlation, i.e. a
low value from one distribution will correspond to a low value in the other distribution and a high
value from one distribution will correspond to a high value from the other.
A correlation value of 0 means that there is no relationship between the two distributions.

13.2.1 How rank order correlation works
The rank order correlation coefficient uses the ranking of the data, i.e. what position (rank) the data
point takes in an ordered list from the minimum to maximum values, rather than the actual data values
themselves. It is therefore independent of the distribution shapes of the datasets and allows the integrity
of the input distributions to be maintained. Spearman's p is calculated as

where n is the number of data pairs and A R is the difference in the ranks between data values in
the same pair. This is in fact a short-cut formula where there are few or no ties: the exact formula is
discussed in Section 6.3.10.

Example 13.1
The spreadsheet in Figure 13.4 calculates the Spearman's p for a small dataset. This correlation coefficient is symmetric about the distributions being correlated, i.e. only the difference between ranks
A/
1
-

i

2
3
4
5
6
7
8
9
10
22
23
-

-

B

I c

( D I E ]

F

I G ~

Variable Variable Rank Rank Difference
A value B value of A of B in ranksA2
90.86
77.57
4
9
25
110.89 95.04
18
17
1
86.84
66.35
2
4
4
92.24
71.11
5
6
1
8
1
75.90
7
95.88
15
16
19
115.14 89.06
83.53
51.16
1
1
0
96.96
87.34
8
14
36
87.88
59.84
3
2
1

H

I

I

I

Number of data pairs:
Rank order correlation :

=COUNT(B4:B23)

24

Figure 13.4 An example of the calculation of Spearman's rank order correlation coefficient.

IK

J

20
0.72

3 58

R~skAnalysis

is important and not whether distribution A is being correlated with distribution B or the other way
round. +
In order to apply rank order correlation to a pair of probability distributions, risk analysis software
has to go through several steps. Firstly, a number of rank scores equivalent to the number of iterations
is generated for each distribution that is to be correlated. Secondly, these rank score lists are jumbled
up so that the specified correlation is achieved between correlated pairs. Thirdly, the same number of
samples are drawn from each distribution and sorted from minimum to maximum. Finally, these values
are used during the simulation: the first to be used has the same ranking in the list as the first value in
its rank score list, and so on, until all rank scores and all generated values have been used.

13.2.2 Use, advantages and disadvantages of rank order correlation
Rank order correlation provides a very quick and easy to use method of modelling correlation between
probability distributions. The technique is "distribution independent", i.e. it has no effect on the shape
of the correlated distributions. One is therefore guaranteed that the distributions used to model the
correlated variables will still be replicated.
The primary disadvantage of rank order correlation is the difficulty in selecting the appropriate correlation coefficient. If one is simply seeking to reproduce a correlation that has been observed in previous
data, the correlation coefficient can be calculated directly from the data using the formula in the previous
section. The difficulty appears when attempting to model an expert's opinion of the degree of correlation
between distributions. A rank order correlation lacks intuitive appeal, and it is therefore very difficult
for experts to decide which level of correlation best represents their opinion.
This difficulty is compounded by the fact that the same degree of correlation will look quite different
on a scatter plot for different distribution types, e.g. two lognormals with a 0.7 correlation will produce
a different scatter pattern to two uniform distributions with the same correlation. Determining the
appropriate correlation coefficient is more difficult still if the two distributions do not share the same
geometry, e.g. one is normal and the other uniform, or one is a negatively skewed triangle and the
other a positively skewed triangle. In such cases, the scatter plot will often show quite surprising results
(Figure 13.5 illustrates some examples).
Figure 13.6 shows that correlation only becomes visually evident at levels of about 0.5 or above
(or about -0.5 or below for negative correlation). Producing scatter plots like this at various levels of
correlation for two variables can help subject matter experts provide estimates of levels of correlation
to be applied.
Another disadvantage of rank order correlation is that it ignores any causal relationship between the
two distributions. It is usually more logical to think of a dependency relationship along the lines of that
described in Sections 13.4 and 13.5.
A further disadvantage of which most people are unaware is that an assumption of the correlation shape
has already been built into the simulation software. The programming technique was originally developed
in a seminal paper by Iman and Connover (1982) who used an intermediate step of translating the random
numbers through van der Waerden scores. Iman and Conover found that these scores produced "naturallooking" correlations: variables correlated using van der Waerden scores produced elliptical-shaped
scatter plots, while using the ranking of the variables directly produced scatter patterns that were pinched
in the middle and fanned out at each end. For example, correlating two Uniform(0, 1) distributions
together (the same as plotting the cdfs of any two continuous rank order correlated distributions) produces
the patterns in Figure 13.7.

Chapter 13 Modelling correlation and dependencies

..
. . . . . .....
.

K=- . .."**-*. . .
,...
-?

.-6

E
.-

..

.
.
)
-

.

. , , , -m

.

I

-

*..

I)-.

o.."**o-..*..

.

**a

. . . .. .
.... .

.

.

**-*.I)*

I

359

.**-..

.I..

0.0

"I..

<
.
.
.
0..
.
I
.

I Y

i

Triang(0,0,40)

Figure 13.5 Examples of patterns produced by correlating different distribution types with a rank order
correlation of 0.8.

i
Correlat~on= 0

-

1,

.. . .

Correlation = 0 2

iZ

-

*'

1,
C

C

a,

w
C

a,

Q

a,

a,

.- .

a,

n

1

i

0"

..

.

1

I

I

1

Independent x

Independent x

I

Correlation = 0.4

Correlation = 0 6

.

-

1,
C

-

.*

C

a,

u

0"

:*:

.'

Independent x

Independent x

Correlation = 0 8

-

Correlation = 0.99

K

-

U

-0

>..

A

C
a,

a,

C

C

8

i?

a,
Q

. .
Independent x

n
a,

Independent x

Figure 13.6 Patterns produced by two normal distributions with varying degrees of rank order correlation.

Chapter 13 Modell~ngcorrelat~onand dependenc~es 3 6 1

-

0.5 rank correlation

0

0.2

0.4

0.6

0.8 rank correlation

0.8

1

0

0.2

0.9 rank correlation

0

Figure 13.7
correlation.

0.2

0.4

0.6

0.4

0.6

0.8

1

0.8

1

0.95 rank correlation

0.8

1

0.2

0.4

0.6

Patterns produced by two Uniform(0,l) distributions with varying degrees of rank order

Notice that the patterns are symmetric about the diagonals of Figure 13.7. In particular, rank order
correlation will "pinch" the variables to the same extent at each extreme. In fact there are a wide variety
of different patterns that could give us the same level of rank correlation. To illustrate the point, the
following plots in Figure 13.8 give the same 0.9 correlation as the bottom-left pane of Figure 13.7, but
are based on copulas which I discuss in the next section.
There are times in which two variables are perhaps much more correlated at one end of their distribution than the other. In financial markets, for example, we might believe that returns from two
correlated stocks of companies in the same area (let's say mobile phone manufacture) are largely uncorrelated except when the mobile phone market takes a huge dive, in which case the returns are highly
correlated. Then the Clayton copula in Figure 13.7 would be a much better candidate than rank order
correlation.
The final problem with rank order correlation is that it is a simulation technique rather than a probability model. This means that, although we can calculate the rank order correlation between variables
(ModelRisk has the VoseSpearman function to do this; it is possible in Excel but one has to create a
large array to do it), and although we can use a bootstrap technique to gauge the uncertainty about that

362

Risk Analys~s

Frank copula

Clayton copula

0

02

0.4

0.6

0.8

1

0

0.2

0.2

0.4

0.6

0.6

0.8

1

0.8

1

T copula (nu = 2)

Gumbel copula

0

0.4

0.8

1

0

0.2

0.4

0.6

Figure 13.8 Patterns produced by different copulas with an equivalent 0.9 rank order correlation.

correlation coefficient (VoseSpearmanU), it is not possible to compare correlation structures statistically;
for example, it is not possible to use maximum likelihood methods and produce goodness-of-fit statistics. Copulas, on the other hand, are probability models and can be compared, ranked and tested for
significance.
In spite of the inherent disadvantages of rank order correlation, its ease of use and its speed of
implementation make it a very practical technique. In summary, the following guidelines in using rank
order correlation will help ensure that the analyst avoids any problems:
Use rank order correlation to model dependencies that only have a small impact on your model's
results. If you are unsure of its impact, run two simulations: one with the selected correlation
coefficient and one with zero correlation. If there is a substantial difference between the model's
final results, you should choose one of the other more precise techniques explained later in this
chapter.
Wherever possible, restrict its use to pairs of similarly shaped distributions.

Chapter 13 Modell~ngcorrelat~onand dependencies

363

If differently shaped distributions are being correlated, preview the correlation using a scatter plot
before accepting it into the model.
If using subject matter experts (SMEs) to estimate correlations, use charts at various levels of
correlation to help the expert determine the appropriate level of correlation.
Consider using copulas if the correlation is important or shows an unusual pattern.
Avoid modelling a correlation where there is neither a logical reason nor evidence for its existence.
This last point is a contentious issue, since many would argue that it is safer to assume a 100 % positive
or negative correlation (whichever increases the spread of the model output) rather than zero. In my
view, if there is neither a logical reason that would lead one to believe that the variables are related in
some way nor any statistical evidence to suggest that they are, it seems that one would be unjustified in
assuming high levels of correlation. On the other hand, using levels of correlation throughout a model
that maximise the spread of the output, and other correlation levels that minimise the spread of the
output, does provide us with bounds within which we know the true output distribution(s) must lie. This
technique is sometimes used in project risk analysis, for example, where for the sake of reassurance one
would like to see the most widely spread output feasible given the available data and expert estimates.
I suspect that using such pessimistic correlation coefficients proves helpful because it in some general
way compensates for the tendency we all have to be overconfident about our estimates (of time to
complete the project's tasks, for example, thereby reducing the distribution of possible outcomes for
the model outputs like the date of completion) as well as quietly recognising that there are elements
running through a whole project like management competence, team efficiency and quality of the initial
planning - factors that it would be uncomfortable to model explicitly.

13.2.3 Uncertainty about the value of the correlation coefficient
We will often be uncertain about the level of rank order correlation to apply. We will be guided by
either available data or expert opinion. In the latter case, determining an uncertainty distribution for the
correlation coefficient is simply a matter of asking for a subject matter expert to estimate a feasible
correlation coefficient: perhaps just minimum, most likely and maximum values which can then be fed
into a PERT distribution, for example. The expert can be helped in providing these three values by
being shown scatter plots of various degrees of correlation for the two variables of interest.
In the case where data are available on which the estimate of the level of correlation is to be based,
we need some objective technique for determining a distribution of uncertainty for the correlation
coefficient. Classic statistics and the bootstrap both provide techniques that accomplish this. In classical
statistics, the uncertainty about the correlation coefficient, given the data set ({xi1, { y i I), i = 1, . . . , n,
was shown by R. A. Fisher to be as follows (Paradine and Rivett, 1964, pp. 208-210):

where tanh is the hyperbolic tangent, tanh-' is the inverse hyperbolic tangent, is the rank correlation
of the set of observations and p is the true rank correlation between the two variables.
The bootstrap technique that applies here is the same technique usually used to estimate some statistic, except that we have to sample the data in pairs rather than individually. Figure 13.9 illustrates a
spreadsheet where this has been done. Note that the formula that calculates the rank is modified from
the Excel function RANK(), since this function assigns the same lowest-value rank to all data values
that are equal: in calculating p we require the ranks of tied data values to equal the average of the ranks

364

Risk Analysis

1
2
-3 .
4
5
27
28
29
30
31
32
33
34
35
36
37
38
39
40

A1

Sorted data

x
84.61
87.78
116.90
119.64

-

-

-

C

B

B4:C28
D4:D28
E29
F4:F28
G4:G28
H4:128
129
130

Y
1.41
1.68
9.13
9.90

I

D

E

Correlation Calculation
Rankx
Rank y
25
25
24
24
2
3
1
2
Data
0.71 9231

F

I

G

Bootstrap sample

x
111.99
90.18
110.98
88.67

Y
8.30
5.13
8.88
4.59

H

I

Correlation Calculation
Rankx
Rank y
5
6
19.5
16.5
7
4
24
19
Bootstrap
0.570769
Fischer
0.753599

Formulae table
25 data pairs sorted in order of
x
=RANK(B4,B$4:B$28)+(COUNTIF(B$4:B$28,B4)-1)/2
{=I -6*SUM((E4:E28-D4:D28)A2)/(25*(25A2-1))}
=VoseDuniform(B$4:B$28)
=VLOOKUP(F4,B$4:C$28,2)
=RANK(F4,F$4:F$28)+(COUNTIF(F$4:F$28,F4)1)/2
(=1-6*SUM((14:128-H4:H28)"2)/(25*(25"2-1))]
=TANH(VoseNormal(ATANH(E29),1/SQRT(22)))

Figure 13.9 Model to determine uncertainty of a correlation coefficient using the bootstrap.

that the tied values would have had if they had been infinitesimally separated. So, for example, the
dataset (1, 2 , 2 , 3 , 3 , 3 , 4 ) would be assigned the ranks {1,2.5,2.5,5,5,5,7).The 2s have to share the
ranks 2 and 3, so get allocated the average 2.5. The 3s have to share the ranks 4, 5, 6, so get allocated
the average 5. The Duniform distribution has been used randomly to sample from the { x i } values, and
the VLOOKUP() function has been used to sample the { y i }values to ensure that the data are sampled in
appropriate pairs. For this reason, the data pairs have to be ranked in ascending order by { x i } so that the
VLOOKUP function will work correctly. Note in cell I30 that the uncertainty distribution for the correlation coefficient is calculated for comparison using the traditional statistics technique above. While the
results from the two techniques will not normally be in exact agreement, the difference is not excessive
and they will return almost exactly the same mean values. The ModelRisk function VoseSpearmanU
simulates the bootstrap estimate directly.
Uncertainty about correlation coefficients can only be included by running multiple simulations, if one
uses rank order correlation. As discussed previously (Chapter 7), simulating uncertainty and randomness
together produces a single combined distribution that quite well expresses the total indeterminability of
our output, but without showing the degree due to uncertainty and that due to randomness. However, it
is not possible to do this with uncertainty about rank order correlation coefficients, as the scores used to
simulate the correlation between variables are generated before the simulation starts. If one is intending
to simulate uncertainty and randomness together, a representative value for the correlation needs to be
determined, which is not easy because of the difficulty of assessing the effect of a correlation coefficient
on a model's output(s). The reader may choose to use the mean of the uncertainty distribution for the
correlation coefficient or may choose to play safe and pick a value somewhere at an extreme, say the
5 percentile or 95 percentile, whichever is the most conservative for the purposes of the model.

Chapter 13 Modelling correlation and dependencies

365

13.2.4 Rank order correlation matrices
An important benefit of rank order correlation is that one can apply it to a set of several variables
together. In this case, we must construct a matrix of correlation coefficients. Each distribution must
clearly have a correlation of 1.0 with itself, so the top-left to bottom-right diagonal elements are all 1.0.
Furthermore, because the formula for the rank order correlation coefficient is symmetric, as explained
above, the matrix elements are also symmetric about this diagonal line.
Example 13.2

Figure 13.10 shows a simple example for a three-phase engineering project. The cost of each phase is
considered to be strongly correlated with the amount of time it takes to complete (0.8). The construction
time is moderately correlated (0.5) with the design time: it is considered that the more complex the
design, the longer it will take to finish the design and construct the machine, etc. +
There are some restrictions on the correlation coefficients that may be used within the matrix. For
example, if A and B are highly positively correlated and B and C are also highly positively correlated, A
and C cannot be highly negatively correlated. For the mathematically minded, the restriction is that there
can be no negative eigenvalues for the matrix. In practice, the risk analysis software should determine
whether the values entered are valid and alter your entries to the closest allowable values or, at least,
reject the entered values and post a warning.
While correlation matrices suffer from the same drawbacks as those outlined for simple rank order
correlation, they are nonetheless an excellent way of producing a complex multiple correlation that is
laborious and quite difficult to achieve otherwise.

Construction Construction
Testing cost Testing time
cost
time

Design cost Design time

Design cost

1

;

0

.

8

j

o

j

o

j

Design time

0.8

1

1

0

I

j

o
.

......................................................................

i

0.5

i0

. . _. . . . . . . . . . . . . . . . . . . . . . .

0

;

0.4

......................................................................................................................................

Construction
cost

O

j

O

j

i

1 0 . 8

l

................................... ...........................................................

Construction
time
Testing cost

0

i

0.5

i

:

0.8

1

I

..........................................................................................
O

~

o

l

0

0

0

i

0.4

1

0

~

j

Figure 13.10 An example of a rank order correlation matrix.

0.4

0.4

j

;............. ........:................. .....
j

.................................................................................................................
Testing time

0

j

......................................

;

0.8

1
<

i

O0.8

......................
1

~

l

366

Risk Analysis

Adding uncertainty t o a correlation matrix

Uncertainty about the correlation coefficients in a correlation matrix can be easily added when there are
data available. The technique requires a repeated application of the bootstrap procedure described in the
previous section for determining the uncertainty about a single parameter.
Example 13.3

Figure 13.11 provides a spreadsheet model where a dataset for three variables is used to determine the
correlation coefficient between each variable. By using the bootstrap method, we retain the correlation
between the uncertainty distributions of each correlation coefficient automatically. Cells C32:E32 are
the outputs to this model, providing the uncertainty distributions for the correlation coefficients for
A : B, B : C and A : C. The exact formula has been used to calculate the correlation coefficients because

sums

Calculations
SS(AA) SS(BB) SS(CC) SS(AB) SS(BC) SS(AC)
16
16
16
16
16
16
12.25
-1.25 -1.75
6.25
0.25
8.75
16
4
0
0
-8
0
12.25
-1.25 -1.75
6.25
0.25
8.75
79
9
7
65
79
79

Formulae table
Data ranked in triplets by variable A
=VoseDuniform(B$4:B$13)
=VLOOKUP(E4,B$4:D$l3,2)
G4:G13
=VLOOKUP(E4,B$4:D$13,3)
=RANK(E4,E$4:E$13)+(COUNTIF(E$4:E$13,E4)-1)/2
H14:J14
=AVERAGE(H4:H13)
C18:E27
=(H4-H$l4)A2
F18:F27
=(H4-H$14)*(14-1$14)
G I 8:G27
=(14-1$14)*(J4-J$14)
H I8:H27
=(J4-J$14)*(H4-H$14)
C32 (output) =F28/SQRT(C28*D28)
D32 (output) =G28/SQRT(D28*E28)
E32 (output) =H28/SQRT(C28*E28)
B4:D13

Figure 13.11 Model to add uncertainty to a correlation matrix.

Chapter 13 Modelling correlation and dependencies

367

Generates the correlation matrix

Figure 13.12 Using VoseCorrMat and VoseCorrMatU to calculate a rank order correlation matrix from data.

the number of ties can be large compared with the number of data pairs because there are few data
pairs. +
ModelRisk offers two functions VoseCorrMatrix and VoseCorrMatrixU that will construct the correlation matrix of the data and generate uncertainty about those matrix values respectively, as shown
in the model in Figure 13.12. The functions are particularly useful when you have a large data array
because they use less memory and spreadsheet space and calculate far faster than trying to do the entire
analysis in Excel.
Note that, since the uncertainty distributions for the correlation coefficients in a correlation matrix are
correlated together, the traditional statistics technique by Fisher cannot be used here. Fisher's technique
described the uncertainty about an individual correlation coefficient, but not its relationship to other
correlation coefficients in a matrix, whereas the bootstrap does so automatically.

13.3 Copulas

I

1
i:

Quantifying dependence has long been a major topic in finance and insurance risk analysis and has led
to an intense interest in, and development of, copulas, but they are now enjoying increasing popularity
in other areas of risk analysis where one has considerable amounts of data. The rank order correlation
employed by most Monte Carlo simulation tools is certainly a meaningful measure of dependence but is
very limited in the patterns it can produce, as discussed above. Copulas offer a far more flexible method
for combining marginal distributions into multivariate distributions and offer an enormous improvement
in capturing the real correlation pattern. Understanding the mathematics is a little more onerous but
is not all that important if you just want to use it as a correlation tool, so feel free to skim over the
equations a bit. in the following presentation of copulas, I have used the formulae for a bivariate copula
to keep them reasonably readable and show graphs of bivariate copulas, but keep in mind that the ideas
extend to multivariate copulas too. I start off with an introduction to some copulas from a theoretical
viewpoint, and then look at how we can use them in models. Cherubini et al. (2004) is a very thorough

368

Risk Analysis

and readable exploration of copulas and gives algorithms for their generation and estimation, some of
which we use in ModelRisk.
A d-dimensional copula C is a multivariate distribution with uniformly distributed marginals U(0, 1)
on [O, 11. Every multivariate distribution F with marginals F l , F2, . . . , Fd can be written as

for some copula C (this is known as Sklar's theorem). Because the copula of a multivariate distribution
describes its dependence structure, we can use measures of dependence that are copula based. The
concordance measures Kendall's tau and Spearman's rho, as well as the coefficient of tail dependence,
can, unlike the rank order correlation coefficient, be expressed in terms of the underlying copula alone.
I will focus particularly on Kendall's tau, as the relationships between the value of Kendall's tau ( t )
and the parameters of the copulas discussed in this section are quite straightforward.
The general relationship between Kendall's tau of two variables X and Y and the copula C ( u , v) of
the bivariate distribution function of X and Y is

This relationship gives us a tool for fitting a copula to a dataset: we simply determine Kendall's tau for the
data and then apply a transformation to get the appropriate parameter value(s) for the copula being fitted.

13.3.1 Archimedean copulas
An important class of copulas - because of the ease with which they can be constructed and the nice
properties they possess - are the Archimedean copulas, which are defined by

where q is the generator of the copula, which I will explain later. The general relationship between
Kendall's tau and the generator of an Archimedean copula q, ( t ) for a bivariate dataset can be written as

For example, the relationship between Kendall's tau and the Clayton copula parameter a for a bivariate
dataset is given by

The definition doesn't extend to a multivariate dataset of n variables because there will be multiple values
of tau, one for each pairing. However, one can calculate tau for each pair and use the average, i.e.

Chapter 13 Modell~ngcorrelation and dependencies

369

There are three Archimedean copulas in common use: the Clayton, Frank and Gumbel. These are
discussed below.
The Clayton copula

The Clayton copula is an asymmetric Archimedean copula exhibiting greater dependence in the negative
tail than in the positive, as shown in Figure 13.13.
This copula is given by:

and its generator is

where a E [- 1, co) (01, meaning a is greater than or equal to - 1 but can't take a value of zero.
The relationship between Kendall's tau and the Clayton copula parameter a for a bivariate dataset is
given by

The model in Figure 13.14 generates a Clayton copula for four variables.

0

0.2

0.4

0.6

0.8

1

Figure 13.13 Plot of two marginal distributions using 3000 samples taken from a Clayton copula with CY = 8.

370

R~skAnalysis

No. variables n
min = 2

Clayton

Random
number
0.934
0.605
0.473
0.664

0.9338
0.9363
0.9304
0.9575

Formulae table
C5:C8
D5:D8
E5:E8

=RAND()
=F5LAlpha
=SUM($D$5:D5)

F6:F8 out uts = (E5-B6+2 * C W Al ha / Al ha* 1-B6

-1

-1 +1 A -11 A1 ha

Figure 13.14 Model to generate values from a Clayton(alpha) copula.

The Gumbel copula

The Gumbel copula (aka.Gumbe1-Hougard copula) is an asymmetric Archimedean copula, exhibiting
greater dependence in the positive tail than in the negative as shown in Figure 13.15.
This copula is given by

and its generator is
q, (t) = (- In t ) ,

where a E [- 1, co).The relationship between Kendall's tau and the Gumbel copula parameter a for a
bivariate dataset is given by

a=A

1
1-t

The model in Figure 13.16 shows how to generate the Gumbel copula.
The Frank copula

The Frank copula is a symmetric Archimedean copula, exhibiting an even, sausage-type correlation
structure as shown in Figure 13.17.
This copula is given by

and its generator is

Chapter 13 Modelling correlat~onand dependencies

371

Figure 13.15 Plot of two marginal distributions using 3000 samples taken from a Gumbel copula with
a = 5.

where a! E ( - a , a)(O}. The relationship between Kendall's tau and the Frank copula parameter
a bivariate dataset is given by

a!

for

where

is a Debye function of the first kind. There is a simple way to generate values for the Frank copula
using the logarithmic distribution, as shown by the following model in Figure 13.18.

13.3.2 Elliptical copulas
Elliptical copulas are simply the copulas of elliptically contoured (or elliptical) distributions. The most
commonly used elliptical distributions are the multivariate normal and Student t-distributions. The
key advantage of elliptical copulas is that one can specify different levels of correlation between the

372

Risk Analysis

1 gamma

alpha

part1

No. variables
1
2
3
4

Theta0
1.0703469541
1.570796327

Iz

I

part2
10.999848361

1

It
0.51

0.51

Ix

19.517360781

Random number
0.960
0.61 1
0.642
0.223

I

214.6880091
s
0.009
0.006
0.006
0.002

107.3440045

Gumbel copula
0.9098
0.9273
0.9256
0.9555

Formulae table
65 (alpha)
C5 (gamma)
D5 (1)
E5 (ThetaO)
68 (part1)
C8 (part2)
D8 (4
E8 (x)
C11:C14
Dll:D14
Ell:E14 (Output)

=l/theta

=COS(P1()/(2*theta))Atheta
=VoseUniform(-Pl()/2,P1()/2)
=ATAN(TAN(PI()*AIpha/2))/Alpha
=(SIN(Alpha)*(ThetaO+t))/((COS(Alpha*Theta0)*COS(t))A(1/Alpha))
=(COS(Alpha*ThetaO+(Alpha-l)*t)NoseExpon(l))A((1-Alpha)/Alpha~
=B8*C8
=gamrna*Z
=VoseUniform(O,l )
=C11/$E$8
=EXP(-(Dl lA(lltheta)))

Figure 13.16 Model to generate values from a Gumbel(theta) copula.

marginals, and the key disadvantages are that elliptical copulas do not have closed-form expressions
and are restricted to having radial symmetry. For elliptical copulas the relationship between the linear
correlation coefficient p and Kendall's tau is given by

The normal and Student t-copulas are described below.
The normal copula
The normal copula (Figure 13.19) is an elliptical copula given by

where @-' is the inverse of the univariate standard normal distribution function, and p , the linear
correlation coefficient, is the copula parameter.

Chapter 13 Modelling correlation and dependencies

373

I

Figure 13.17 Plot of two marginal distributions using 3000 samples taken from a Frank copula with a = 8.

The relationship between Kendall's tau and the normal copula parameter p is given by
p(X, Y ) = sin

(5 4

The normal copula is generated by first generating a multinormal distribution with mean vector (0) and
then transforming these values into percentiles of a Normal(0, 1) distribution, as shown by the model
in Fig. 13.20.
The Student t-copula (or Just "the t-copula")

The Student t-copula is an elliptical copula defined as

where v (the number of degrees of freedom) and p (the linear correlation coefficient) are the parameters
of the copula. When the number of degrees of freedom v is large (around 30 or so), the copula converges

Figure 13.18 Model to generate values from a Frank(theta) copula.
I

Figure 13.19 Graph of 3000 samples taken from a bivariate normal copula with parameter p = 0.95.

Chapter 13 Modelling correlation and dependencies

MultiNormal Normal copula

Covariance matrix
0.95
0.95
0.95

1.OO
0.95
0.95

0.95
1.00
0.95

375

0.95
0.95
1.00

0.16428688
0.37788608
0.39489876

0.6472

Figure 13.20 Model to generate values from a normal copula.

Figure 13.21 Graph of 3000 samples taken from a bivariate Student t-copula with v = 2 degrees of freedom
and parameter p = 0.95.

to the normal copula just as the Student distribution converges to the normal. But for a limited number
of degrees of freedom the behaviour of the copulas is different: the t-copula has more points in the tails

376

Risk Analysis

As in the normal case (and also for all other elliptical copulas), the relationship between Kendall's
tau and the Student t-copula parameter p is given by
p(X, Y) = sin

(5 4

Fitting a Student t-copula is slightly more complicated than fitting the normal. We first estimate t
and then, starting with v = 2, we determine the likelihood of observing the dataset. Then we repeat the
exercise for v = 3,4, . . . ,50 and find the combination that produces the maximum likelihood. For v
values of 50 or more there will be no discernible difference to using a fitted normal copula which is
simpler to generate values from.
Generating values from a Student copula requires determining the Cholesky decomposition of the
covariance matrix, as shown by the model in Figure 13.22.

nu

3

1
0.99
0.98
0.97
0.97

1

[chisq distribution

Covariance matrix (lower diagonal)
0
0
1
0
0.98
1
0.97
0.98
0.97
0.98

0
0
0
1
0.99

2.7443

0
0
0
0
1

Cholesky Decomposition
0.99
0.98
0.97
0.97

0.14106736
0.069470358
0.068761477
0.068761477

0.18647753
0.132043338
0.132043338

{=VoseCholesky(B4:F8))
=VoseNormal(O,1)
{=MMULT(Bl2:F16,B19:B23))

32

Figure 13.22 Model to generate values from a Student copula.

0.192188491
0.140156239 0.131501501

Chapter 13 Modelling correlation and dependencies

377

13.3.3 Modelling with copulas
In order to make use of copulas in your risk analysis, you need three things:
1. A method to estimate its parameter(s), which has been described above.
2. A model that generates the copula described above.
3. Functions that use the inversion method to generate values from the marginal distributions to which
you wish to apply the copula. Excel offers a very limited number of such function^,^ but they are
notoriously inaccurate and unstable. You can derive many other inversion functions from the F(x)
equations in Appendix 111.
Let's say that we have a dataset of 1000 joint observations for each of five variables, we fit the
data to gamma distributions for each variable and we correlate them together with a normal copula. In
principle one could do all these things in Excel, but it would be a pretty large spreadsheet, so I am
going to compromise a little. (By the way, I am using gamma distributions here so I can make a model
that works with Excel, though be warned that Excel's GAMMAINV is one of the most unstable). In the
model in Figure 13.23 I am also fitting a marginal gamma distribution to each variable using the method
A

1
2

B I C I D I E I F

GI

H

N

Joint 0 b s e ~ a t i o n
for
~ variables:

3

4
5
6

7

8
9
10

11
12
13
14
5
16
1
7
18
19
20

21
22
23

4.2953
17.544
4.8865
2.2816
5.4732
15.073
0.9581
1.4401
4.0238
4.7946
10.943
2.2683
0.7928
7.8518
5.9436
6.9022
4.2686
3.6353
4.3357
10.947
3.9473

12.769
32.924
2.7555
4.2633
10.366
24.038
4.2373
10.711
7.4557
4.967
24.14
9.9109
12.434
27.508
8.5434
16.847
12.562
5.2728
12.094
9.6294
2.9792

5.6258
14.971
14.085
8.3166
7.2923
41.372
3.8137
1.0511
12.291
9.2351
10.191
12.455
4.5066
14.434
24.088
8.3371
4.5681
5.6276
5.2264
14.573
8.0411

21.734
35.321
19.687
25.215
5.4288
19.957
12.819
11.975
11.627
5.1999
22.018
14.784
32.179
27.597
11.487
17.772
32.628
8.1 151
22.58
17.219
18.909

21.849
27.799
17.224
12.18
17.656
19.139
1.9824
7.9597
14.717
9.2756
17.204
18.206
12.783
18.576
16.876
12.025
10.44
7.6885
10.647
11.977
8.403

4
5

Mean
Variance
Alpha
Beta

1.000
0.578

0.578
1.000

0.490
0.373

0.393
0.281

0.242
0.139

0.393
0.242

0.281
0.139

0.252
0.169

1.000
0.035

0.035
1.000

5.974
17.884

Data statistics
12.118
9.909
18.054
46.878 49.397 117.292

12.140
26.045

Distribution Gamma parameter estimates
1.996
3.132
1.988
2.779
5.658
2.994
3.869
4.985
6.497
2.145

0.470

Fitted Normal copula
0.710
0.357
0.524

0.072

4.732

Correlated Gamma variables
14.811
6.205
16.561

5.660

1003
1004
1005
loos
1007

Figure 13.23 A model using copulas.
BETAINV, CHIINV, FINV, GAMMAINV, LOGINV, NORMINV, NORMSINV and TINV.

lo

378

R~skAnalysts

A
1
2

3

4
5
6

7
8
9
2

11
12

1001
1002
1003

B I C I D I E I F
G I H I
I
I J I K I L I M IN
Joint observations for variables:
2
4
3
1
5
1
2
3
4
5
Fitted Normal copula
4.2953 12.769 5.6258 21.734 21.849
0.006
0.109
0.063
0.465
0.065
17.544 32.924 14.971 35.321 27.799
4.8865 2.7555 14.085 19.687 17.224
Correlated Gamma variables
2.2816 4.2633 8.3166 25.215 12.18
29.043 44.197 38.108 150.113 26.046
5.4732 10.366 7.2923 5.4288 17.656
15.073 24.038 41.372 19.957 19.139
0.9581 4.2373 3.8137 12.819 1.9824
Formulae table
1.4401 10.711 1.0511 11.975 7.9597
(14:M4) {=VoseCopulaMultiNormalFit($B$3:$F$1002,FALSE))
4.0238 7.4557 12.291 11.627 14.717
17:M7 =VoseGammaFit(B3:B1002,14)
4.7946 4.967 9.2351 5.1999 9.2756
7.1342 30.602 19.104 29.436 10.365
7.1934 16.667 5.4139 22.768 16.905

Figure 13.24 The same model as in Figure 13.23, but now in ModelRisk.

of moments: usually you would want to use maximum likelihood, but this involves optirnisation, so the
method of moments is easier to follow, and with 1000 data points there won't be much difference. I
am also foregoing the rather elaborate calculations needed to estimate the normal copula's covariance
matrix by using Excel's CORREL as an approximation. I have used ModelRisk's normal copula function
because it takes up less space, and I have already shown you how to generate this copula above.
The model in Figure 13.24 is the equivalent with ModelRisk.

13.3.4 Making a special case of bivariate copulas
In the standard formulation for copulas there is no distinction between a bivariate (only two marginals)
and a multivariate (more than two marginals) copula. However, we can manipulate a bivariate copula
greatly to extend its applicability.
Sometimes, when creating a certain model, one is interested in a particular copula (say the Clayton
copula), but with a greater dependence in the positive tails than in the negative (a Clayton copula has
greater dependence in the negative tail than in the positive, see Figure 13.13 above).
For a bivariate copula it is possible to change the direction of the copulas by calculating I - X, where
X is one of the copula outputs. For example:
{Al:A2} Clayton copula with
B1
=1-A1
B2
=1-A2

a!

=8

A scatter plot of B1 :B2 is now as in Figure 13.25.
ModelRisk offers an extra parameter to allow control over the possible directional combinations. For
Clayton and Gumbel copulas there are four possible directions, but for the Frank there are just two
possibilities since it is symmetric about its centre. The plots in Figures 13.26 and 13.27 illustrate the
four possible bivariate Clayton copulas (1000 samples) with parameter a = 15 and the two possible
bivariate Frank copulas (1000 samples) with parameter 21.
Estimation of which direction gives the closest fit to data simply requires that one repeat the fitting
methods described above, calculate the likelihood of the data for each direction and select the direction
with the maximum likelihood. ModelRisk has bivariate copula functions that do this directly, returning
either the parameters of the fitted copula or generating values from a fitted copula.

Chapter 13 Modelling correlation and dependenc~es 3 7 9

Figure 13.25 Graph of 3000 samples taken from a bivariate Clayton(8) with both directions reversed.

13.3.5 An empirical copula
In spite of the extra flexibility afforded by copulas I have introduced in the chapter over rank order
correlation, you can see that they still rely on a symmetrical relationship between the variables: draw
a line between (0, 0) and (1, 1) and you get a symmetric pattern about that line (assuming you didn't
alter the copula direction). Unfortunately, real-world variables tend to have other ideas. As risk analysts,
we put ourselves in a difficult situation if we try to squeeze data into a model that just doesn't fit. An
empirical copula gives us a possible solution. Provided we have a good amount of observations, we can
bootstrap the ranks of the data to construct an approximation to an empirical copula, as the model in
Figure 13.28 demonstrates.
The model above uses the empirical estimate rankl(n 1) described in Section 10.2 for the quantile
which should be associated with a value in a set of n data points. The VoseStepUniform distribution
simply randomly picks an integer value between 1 and the number of observations (1000).
This method is very general and will replicate any correlation structure that the data show. It will be
rather slow in Excel when you have large datasets because each RANK function goes through the whole
array of data for a variable to determine its rank - it would be more efficient to use the VoseRank array
function which will take far fewer passes through the data. However, the main drawback to this method
occurs when we have relatively few observations. For example, if we have just nine observations, the
empirical copula will only generate values of (0.1, 0.2, . . . ,0.9} so our model will only generate between
the 10th and 90th percentiles of the marginal distributions.
This problem can be corrected by applying some order statistics thinking along the lines of Equations
10.4 and 10.5. The ModelRisk function VoseCopulaData encapsulates that thinking. In the model in
Figure 13.29 there are just 21 observations, so any correlation structure is only vaguely known.
The plots in Figure 13.30 show how the VoseCopulaData performs. The large grey dots are the data
and the small dots are 3000 samples from the empirical copula: notice that the copula extends over

+

Figure 13.26 The four directional possibilities for a bivariate Clayton copula.

(0, 1) for all variables and fills in the areas between the observations with greatest density concentrated
around the observations.

13.4 The Envelope Method
The envelope method offers a more flexible technique for modelling dependencies that is both intuitive
and easy to control. It models the logic whereby the value of the independent variable statistically
determines the value of the dependent variable. Its drawback is that it requires considerably more effort
than rank order correlation and is therefore only really used where the dependency relationship is going
to produce a significant effect on the final outcome of the model.

Chapter 13 Modell~ngcorrelat~onand dependencies

CopulaB1Frank(21,1)

381

CopulaB1Frank(21,2)

1-

08.

06-

i

0

01

02

03

04

05

06

07

08

09

1

Figure 13.27 The two directional possibilities for a bivariate Frank copula

B I C I D I E I F
Joint observations for variables:
1
2
3
4
5

4.2953
17.544
4.8865
2.2816
5.4732
7.1342
7.1934

12.769
32.924
2.7555
4.2633
10.366
30.602
16.667

5.6258
14.971
14.085
8.3166
7.2923
19.104
5.4139

r n bo o b e t o n
Rowtoselect

I

21.734
35.321
19.687
25.215
5.4288
29.436
22.768

21.849
27.799
17.224
12.18
17.656
10.365
16.905

G

J
K
Ranks of observations
2
3
4
I

H

1

409

1

607

1

330

1

705

L

1

M

5

957

1000I
42

Empirical copula
0.6434 0.8072 0.9151 0.3676 0.6533
\

Figure 13.28 Constructing an approximate empirical copula from data.

13.4.1 Using the envelope method for approximate modelling of straight-line
correlation in observed data
A large number of observed correlations can be quite adequately modelled using a straight-line relationship, as already discussed. If this is the case, the following techniques can prove very valuable. However,
you may sometimes come across a dependency relationship that is curvilinear andlor has a vertical spread
that changes across the range of the independent variable. The bottom graphs in Figure 13.3 illustrate
curvilinear relationships. The following section offers some advice on how the envelope method can
still be used to model such relationships.

382

R~skAnalysis

Empirical copula

Formulae table

I {=VoseCopulaData($B$3:$D$23)}
Figure 13.29 Constructing an empirical copula with few data using ModelRisk.

Using a uniform distribution

The envelope method first requires that all available data are plotted in a scatter plot. The independent
variable is plotted on the x axis and the dependent variable on the y axis. Bounding lines are then
determined that contain the minimum and maximum observed values of the dependent variable for all
values of the independent variable.
Example 13.4

Data on the time that 40 participants took to practise making a wicker basket were negatively correlated
to the time they took to make the basket in a subsequent test, shown in Figure 13.31. Two straight lines,
drawn by eye, neatly contain all of the data points: a minimum line of y = - 0 . 2 8 ~ 57 and a maximum
line of y = - 0 . 4 2 ~ 88. The data look to be roughly vertically uniformly distributed between these
two lines for all values of the x axis. We could therefore predict the test time that would be taken for
any value of the practice time as follows:

+

Test time = Uniform(-0.28

+

* Practice time + 57, -0.42 * Practice time + 88)

Chapter 13 Modelling correlat~onand dependencies

383

Figure 13.30 Scatter plots of random samples from the empirical copula fitted to the data in Figure 13.29.

384

Risk Analysis

90 -

-E
'=
C

max = -0.42x+88

65 -60

0

O

0

0

50 -45 --

0

0

--

0
0
0

min = -0.28x+57

Practice time (hours)

Figure 13.31 Setting boundary lines for the envelope method of modelling dependencies.

Minimum test time
Maximum test time
Modelled test time

Figure 13.32 Dependency model using the envelope method with a uniform distribution.

We have thus defined a uniform distribution for the test time that varies according to the practice time
taken. If we believe that the practice time that will be taken by future workers is Triangle(0, 20, 60), we
can use this dependency model to generate the distribution of test times as illustrated in the spreadsheet
of Figure 13.32.
Consider the Triangle(0, 20, 60) generating a value of 30 in one iteration (see Figure 13.33). The
equation for the minimum test time produces a value of -0.28 * 30 57 = 48.6. The equation for the
maximum test time produces a value of -0.42 * 30 88 = 75.4. Thus, for this iteration, the value for
the test time will be generated from a Uniform(48.6, 75.4) distribution.
The above example is a little simplistic. Using a uniform distribution to model the dispersion between
the minimum and maximum lines obviously gives equal weighting to all values within the range. It is
quite simple to extend this technique to using a triangular or normal distribution in place of the uniform
approximation, both of which provide a central tendency that is more realistic.

+

+

+

Chapter 13 Modelling correlation and dependencies

-

385

Maximum
line

4

Dependent
Uniform

:

Minimum

0

10

20

30

40

50

60

Practice time: Triang(0,20,60)

Figure 13.33 Illustration of how the dependency model of Figure 13.32 works.
Using a triangular distribution

Employing a triangular distribution requires that, in addition to the minimum and maximum lines, we
also provide the equation of a line that defines the most likely value for the dependent variable for each
value of the independent variable. The triangular distribution is still a fairly approximate modelling tool.
It is therefore quite reasonable to draw a line through the points of greatest vertical density. Alternatively,
you may prefer to find the least-squares fit line through the available data. All professional spreadsheet
programs now offer the facility to find this line automatically, making the task very simple. A third
option is to say that the most likely value lies midway between the minimum and maximum.
Example 13.5

Figures 13.34 and 13.35 provide an illustration of the envelope method with triangular distributions.

Minimum test time
Most likely test time
Maximum test time

Figure 13.34 Dependency model using the envelope method with a triangular distribution.

+

386

Risk Analys~s

80 -Maximum line

Minimum line
40 -*

,
30 '
0

,,

,

-

, ' Independent
Triang

--------

I

10

20
30
40
Practice time: Triang(0,20,60)

---60

50

Figure 13.35 Illustrationof how the dependency model of Figure 13.34 works.

Using a normal distribution

This option involves running a least-squares regression analysis and finding the equation of the leastsquares line and the standard error of the y-estimate Syx. The Syx statistic is the standard deviation of
the vertical distances of each point from the least-squares line. Least-squares regression assumes that the
error of the data about the least-squares line is normally distributed. Thus, if y = ax b is the equation
of the least-squares line, we can model the dependent distribution as y = Normal(ax b, Syx).

+
+

Example 13.6

Figure 13.36 provides an illustration of the envelope method with normal distributions.

+

Comparison ofthe uniform, triangular and normal methods

Figure 13.37 compares how the three envelope methods behave. The graphs on the left cross-plot a
Triangle(0, 20, 60) for the practice time (x axis) against the resulting test time (y axis). The graphs on
the right show histograms of the resulting test time distributions.
The uniform method produces a scatter plot that is vertically evenly distributed and strongly bounded.
The test time histogram has the flattest shape with the widest "shoulders" of the three methods. The
triangular method produces a scatter plot that has a vertical central tendency and is also strongly bounded.
The histogram is the most peaked of the three methods, producing the smallest standard deviation. The
normal method produces a scatter plot that has a vertical central tendency but that is unbounded. This
will generally be a closer approximation to a plot of available data. The histogram has the widest range
of the three methods. Using the normal distribution has two advantages over the other two methods:
the equation of the line and standard deviation are both calculated directly from the available data and
don't involve any subjective estimation; and the unbounded nature of the normal distribution gives the
opportunity for generated values to fall outside the range of the observed values. This second point may
help ensure that the range of the dependent distribution is not underestimated.

Chapter 13 Modelling correlation and dependencies

387

Regression line:
y = -0.4594*~+74.51
Syx= 8.16

/

0

0

,'Dependent

--

Normal

*

-r

..
Practice time (hours)

Figure 13.36 Using the normal distribution to model a dependency relationship.

Finally, it is important to be sure that the formula you develop will be valid over the entire range of
values that are to be generated for the two variables. For example, the normal formula can potentially
generate negative values for test time. It could, however, be mathematically restricted to prevent a
negative tail, for example by using an IF (test-time < 0, 0, test-time) statement.

13.4.2 Using the envelope method for non-linear correlation observed from
available data
One may come across a correlation relationship that cannot be adequately modelled using a straight-line
fit, as in the examples of Section 13.4.1. However, with a little extra work, the techniques described
above can be adapted to model most relationships.
The first stage is to find the best curvilinear line that fits the data. Microsoft Excel, for example,
offers a choice of automatic line fitting: linear, logarithmic, polynomial (up to sixth order), power
and exponential. Several of these fitted lines can be overlaid on the data to help determine the most
appropriate equation. The second stage is to use the equation of the selected line to determine the
predicted values for the dependent variable for each value of the independent variable. The difference
between the observed and predicted values of the dependent variable (i.e. the error terms) are then
calculated and cross-plotted against the independent variable. The third stage is to determine how these
error terms should be modelled. Any of the three techniques described in Sections 13.4.1 could be used.
The final stage is to combine the equation of the best-fit line with the distribution for the error term.
Example 13.7

Data on the amount of money a cosmetic company spends on advertising the launch of a new product are
compared with the volume of initial orders it receives (Figure 13.38) and cross-plotted in Figure 13.39.
Clearly, the relationship is not linear: an example of the law of diminishing returns. The best-fit line is
determined to be logarithmic: y = 1374.8 * LN(x) - 10713. The error terms appear to have approximately the same distribution across the whole range of advertising budget values. Since the distribution

-

388

Risk Analysis

100 -

40 -30 -20
0

i

8

20

40

60

20

40

60

80

100

Dependent distribution from Uniform formula

Scatter plot from Uniform formula

100 790 --

40 -30 -20
0

I

I

20

40

Scatter plot from Triangle formula

60

20

60

80

100

Dependent distribution from Triangle formula

20
Scatter plot from Normal formula

40

40

60

80

100

Dependent distribution from Normal formula

Figure 13.37 Comparison of the results of the envelope method of modelling dependency using uniform,
triangular and normal distributions.

of error terms appears to have a greater concentration around zero, we might assume that they are normally distributed and calculate their standard deviation (= 126 from Figure 13.38). The final equation
for the total initial order can then be written as
Total initial order = 1374.8 * LN(advertising-budget) - 10 713

+ Normal(0, 126)

Total initial order = Normal(1374.8 a LN(advertising-budget) - 10713, 126)

+

Chapter 13 Modelling correlation and dependencies

Advertising
Initial
budget (A) order (0)
1973
9743
12011
2132
220
2818
24 303
3091
2536
15 082
1573
8142
2652
18 183
17 728
2992
2822
18 531
18 795
2786
16 820
2665
18114
2737
19 603
3003
23 290
3032
Standard deviation

D4:D17

389

Observed difference
to prediction (D)
59
-70
12
-80
22
-93
-120
255
24
-30
1
-29
129
-80
126

Formulae table
=C4-(1374.8*LN(B4)-10713)
=STDEV(D4:D17)

Figure 13.38 Analysis of data and error terms for a curvilinear regression for Example 13.7

Advertising budget

Figure 13.39 The best-fitting non-linear correlation for the data of Example 13.7.

13.4.3 Using the envelope method to model expert opinion of correlation
It is very difficult to get an intuitive feel for rank order correlation coefficients, even when one is familiar
with probabilistic modelling. It is therefore recommended that the more intuitive envelope method be
employed for the modelling of an expert's opinion of a dependency where that dependency is likely to
have a large impact.

390

Risk Analysis

The technique involves the following steps:
Discuss with the expert the logic of how he or she perceives the relationship between the two
variables to be correlated. Review any available data.
Determine the independent and dependent variables. If the causal relationship is unclear, select either
to be the independent variable according to which will be easiest.
Define the range of the independent variable and determine its distribution (using a technique from
Chapter 9 or 10).
Select several values for the independent variable. These values should include the minimum and
maximum and a couple of strategic points in between.
Ask the expert hisher opinion of the minimum, most likely and maximum values for the dependent
variable should each of these selected values of the independent variable occur. I often prefer to ask
for the practical minimum and maximum.
Plot these values on a scatter diagram and find the best-fit lines through the three sets of points
(minima, most likely values and maxima).
Check that the expert agrees the plot is consistent with hisher opinion.
Use these equations of the best-fit lines in a triangular or PERT distribution to define the dependent
variable.
Example 13.8

Figure 13.40 illustrates an example where the expert is defining the relationship between a bank's
average mortgage rate and the number of new mortgages it will sell. The expert has given her opinion
of the practical minimum, most likely and practical maximum values of the number of new mortgages
for four values of the mortgage rate, as shown in Table 13.1. She has defined practical minimum and

6%

7%

8%

9%
10%
11%
Mortgage rate

12%

13%

14%

Figure 13.40 An example of the use of the envelope method to model an expert's opinion of a dependency
relationship or correlation.

Chapter 1 3 Modelling correlation and dependencies

3 91

Table 13.1 Data from expert elicitation.
Mortgage rate (%)

New mortgages
Min
Most likely

Max

maximum to mean, for her, that there is only a 5 % chance that the mortgage will be below and above
those values respectively.
This technique has the advantage of being very intuitive. The expert is asked questions that are
both meaningful and easy to think about. It also has the advantage of avoiding the need to define
the distribution shape for a dependent variable: the shape will be dictated by its relationship to the
independent variable.

+

13.4.4 Adding uncertainty in the envelope method
It is a relatively simple matter to add uncertainty into the envelope method. If data exist to develop the
dependency relationship, one can use the bootstrap method or traditional statistics to give uncertainty
distributions for the least-squares fit parameters. Uncertainty about the boundaries can be included by
simply looking at the best-guess line, as well as extreme possibilities for the minimum and maximum
boundaries on y .

13.5 Multiple Correlation Using a Look-Up Table
There may be times when it is necessary to model the simultaneous effect of an external factor on
several parameters within a model. An example is the effect of poor weather on a construction site. The
times taken to do an archaeological survey of the land, dig out the foundations, put in the form work,
build the foundations, construct the walls and floors and assemble the roof could all be affected by the
weather to varying degrees.
A simple method of modelling such a scenario is to use a spreadsheet look-up table.
Example 13.9

Figure 13.41 illustrates the example above, showing the values for one particular iteration. The model
works as follows:
Cells D5:DlO list the estimates of duration of each activity if the weather is normal.
The look-up table F4:JlO lists the percentages that the activities will increase or decrease owing to
the weather conditions.
Cell D l 3 generates a value for the weather from 1 to 5 using a discrete distribution that reflects the
relative likelihood of the various weather conditions.

L

392

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

-

Risk Analysis

A I B ~

2
3
4
5
6
7

c

Archeology
Dig found'n
Form work
Lay found'n
Walls & floors
Lay roofing

l ~ e a t h e index:
r

D

E

Base
Estimate
4.3
10.9
2.2
6.7
16.7
7.6
Total time
2

Revised
Estimate
5.52
13.1
2.2
8.4
17.4
8.3
54.93

1

I

F
VP
1
40%
30%
10%
40%
10%
20%

D5:DlO
E5:ElO
D l3
El1

I

G

H

I

I

% increase:weather conditionlindex
Poor
Normal
Good
2
3
4
28%
0%
0%
20%
0%
-6%
4%
0%
0%
25%
0%
-1 2%
4%
0%
0%
8%
0%
-4%

1

J

VG
5
-2%
-1 0%
-3%
-1 8%
-2%
-6%

Formulae table
'=~ose~riangle(3,4,6),
VoseTriangle(9,11,13),etc..
=D5*(1+HLOOKUP(D$13,F$4:J$1O,B5))
=VoseDiscrete({l,2,3,4,5),{2,5,4,3,2))
=SUM(ES:EIO)

Figure 13.41 Using a look-up table to model multiple dependencies.

Cells E5:ElO add the appropriate percentage change for that iteration to the base estimate time by
looking it up in the look-up table.
Cell E l l adds up all the revised durations to obtain the total construction time.
It is a simple matter to include uncertainty in this technique. One needs simply to add uncertainty
distributions for the magnitude of effect (in this case, the values in cells F5:JlO). A little care is needed
if the uncertainty distributions overlap for an activity. So, for example, if we used a PERT(30 %, 40 %,
50 %) uncertainty distribution for the parameter in cell F5 and a PERT(20 %, 28 %, 35 %) uncertainty
distribution for the parameter in cell G5, we could be modelling a simulation where very poor weather
increases the archaeological digging time by 31 % but poor weather increases the time by 33 %. Using
high levels of correlation for the uncertainty distributions of effect size across a task will remove
this problem quite efficiently and reflect that errors in estimating the effect (in this case weather) will
probably be similar for each effect size. +

Chapter 14

tl~clt~ng
trom expert oplnlon
14.1 Introduction
Risk analysis models almost invariably involve some element of subjective estimation. It is usually
impossible to obtain data from which to determine the uncertainty of all of the variables within the
model accurately, for a number of reasons:
The data have simply never been collected in the past.
The data are too expensive to obtain.
Past data are no longer relevant (new technology, changes in political or commercial environment, etc.).
The data are sparse, requiring expert opinion "to fill in the holes".
The area being modelled is new.
The uncertainty in subjective estimates has two components: the inherent randomness of the variable
itself and the uncertainty arising from the expert's lack of knowledge of the parameters that describe that
variability. In a risk analysis model, these uncertainties may or may not be distinguished, but both types
of uncertainty should at least be accounted for in a model. The variability is best included by assuming
some sort of stochastic model, and the uncertainty is then included in the uncertainty distributions for
the model parameters.
When insufficient data are available to specify the uncertainty of a variable completely, one or more
experts will usually be consulted to provide their opinion of the variable's uncertainty. This chapter
offers guidelines for the analyst to model the experts' opinions as accurately as possible.
I will start by discussing sources of bias and error that the analyst will encounter when collecting
subjective estimates. We then look at a number of techniques used in the modelling of probabilistic
estimates, and particularly the use of various types of distribution. The analyst is then shown how to
employ brainstorming sessions to ensure that all of the available information relevant to the problem is
disseminated among the experts and the uncertainty of the problem openly discussed. Finally, we look
at methods for eliciting expert opinion in one-to-one interviews with the analyst.
Before delving into the techniques of subjective estimation, I would like the reader to consider the
following two points that have been the downfall of many a model I have been asked to evaluate:
Firstly, the most significant subjective estimate in a model is often in designing the structure of
the model itself. It is surprising how often the structure of a model evades criticism while the
figures within it are given all the scrutiny. Before committing to a specific model structure, it is
recommended that the analyst seeks comment from other interested parties as to its validity. In turn,
this action will greatly enhance the analyst's chances of having the model's results accepted and of

394

Risk Analysis

receiving cooperation in determining the input uncertainties. Good analysts should take this stage very
seriously and promote an environment where it is possible to provide open criticism of their work.
The second point is that the analysts should not take it upon themselves to provide all of the subjective assessments in a model. This sounds painfully obvious, but it still astounds me how many
analysts believe that they can estimate all or most of the variables within their model by themselves
without consulting others who are closer to the particular problem.

14.2 Sources of Error in Subjective Estimation
Before loolung at the techniques for eliciting distributions from an expert, it is very useful to have an
understanding of the biases that commonly occur in subjective estimation. To introduce this subject,
Section 14.2.1 describes two exercises I run in my risk analysis training seminars and that the reader
might find educational to conduct in his or her own organisation. In each exercise the class members have
their own PCs and risk analysis software to help them with their estimates. Section 14.2.2 summarises
the sources of heuristic errors and biases: that is, errors produced by the way people mentally approach
the task of parameter estimating. Finally, Section 14.2.3 looks at other factors that may cause inaccuracy
in the expert's estimates.

14.2.1 Class experiments on estimating
This section looks at two estimating exercises I regularly use in my training seminars on risk analysis
modelling. Their purpose is to highlight some of the thought processes (heuristics) people use to produce
quantitative estimates. The reader should consider the observations from these exercises in conjunction
with the points raised in Section 14.2.2.
Class estimating exercise I

Each member of the class is asked to provide practical minimum, most likely and practical maximum
estimates for a number of quantities (usually 8). The class is instructed that the minimum and maximum
should be as close as possible to each other such that they are 90 % confident that the true value falls
between them. The class is encouraged to ask questions if anything is unclear.
The quantities being estimated are obscure enough that the class members will not have an exact
knowledge of their values, but hopefully familiar enough that they can have a go at estimating the
value. The questions are changed to be relevant to the country in which the seminar is run. Examples
of these quantities are:
the distance from Oxford to Edinburgh along main highway routes in kilometres;
the area of the United IOngdom in square kilometres;
the mass of the Earth in metric tonnes;
the length of the Nile in kilometres;
the number of pages in the December Vogue UK magazine;
the population of Scranton, USA;
the height of K2, Kashmir, in metres;
the deepest ocean depth in metres.

Chapter 14 Eliciting from expert opinion

395

Number of pages in October 1995 UK Cosmopolitan Magazine
150

200

250

300

I

I

a

I
I

350

400
I

br

c.;------- d

I
I

er

-,

f

r

- 9

1

v
j
f

actual value

h
-

I

Figure 14.1

How to draw up class estimates from exercises 1 and 2 on a blackboard.

Each member of the class fills out a form giving the three values for each quantity. When everyone
has completed their forms, I get the class to pick one of these quantities, e.g. the length of the Nile. I
then question each member of the class to find out the minimum and the maximum, i.e. the total range
of all of the estimates they have made. On the blackboard, I draw up a plot of each class member's
three-point estimate, as illustrated in Figure 14.1, and then superimpose the true value. There is almost
invariably an expression of surprise at the true value. Sometimes, after I have drawn all of the estimates
up on the blackboard, I will ask if any of the class wishes to change their estimate before I reveal the
true value. Some will choose to do so, but this rarely increases their chance of encompassing the true
value. I will often repeat this process for four or five of the measurements to collect as many of the
lessons to be learned from the exercise as possible.
Now, if the class members were perfectly calibrated, there would be a 90 % chance (i.e. the defined
level above) that each true value would lie within their minimum to maximum range. By "calibrated
I mean that their perceptions of the precision of their knowledge were accurate. If there are eight
quantities to be estimated, the number that fall within their minimum to maximum range (their score
for this exercise) can be estimated by a Binomial(8, 90 %) distribution, as shown in Figure 14.2.
A host of interesting observations invariably comes out of this exercise. The underlying reasons for
these observations and those of the following exercise are summarised in Section 14.2.2:
In the hundred or so seminars in which I have performed this exercise, I have very rarely seen a
score higher than 6. From Figure 14.2 we can see that there is only a 4 % chance that anyone would
score 5 or less if they were perfectly calibrated. If we take the average score for all members of the
class and assume the distribution of scores to be approximately binomial, we can estimate the real
probability encompassed by their minimum to maximum range. The mean of a binomial distribution
is np, where n is the number of trials (in this case, 8) and p is the probability of success (here,
the probability of falling between the minima and maxima). The average individual score for the
whole class is usually around 3, giving a probability p of = 37.5 %. In other words, where they
were providing a minimum and maximum for which they believed there was a 90 % chance of the
quantity falling between those values, there was in fact only about a 37 % chance. One reason for
this "overconfidence" (i.e. the estimated uncertainty is much smaller than the real uncertainty) is
anchoring, discussed in Section 14.2.2. Figure 14.3 shows the distribution for the largest class for
which I have run this exercise (and the only class for which I kept the results).

396

Risk Analysis

Figure 14.2 Binomial(8, 90 %) distribution for forecasting test scores.

Figure 14.3 Example of scores produced by a large class in the estimating exercise.

The estimators often confuse the units (e.g. miles instead of kilometres, kilograms instead of tonnes),
resulting in a gross error.
In estimating the population of Scranton, some estimators provide a huge maximum estimate. Since
most people have never heard of Scranton, it makes sense that it has a smaller population than
London, New York, etc., but some people ignore this obvious deduction and offer a maximum that
has no logical basis (their estimation is strongly affected by the fact that they have never heard of
Scranton rather than any logic they could apply to the problem).
When the class discusses the quantities, they can usually agree on a logic for their estimation.
If estimators are very sure of their quantity, they may nonetheless provide an unrealistically large
range given their knowledge ("better to be safe") or, more commonly, provide just slightly too
narrow a range (resulting in a protest when I don't award them a correct answer!). I once asked a

Chapter 14 Eliciting from expert opinion

397

class in New Zealand to estimate the area of their country. A gentleman from their Met Office asked
if that was at low or high tide, to the amusement of us all. He knew the answer precisely, but the
true value fell outside his range because he had not known the precise conversion factor between
acres and square kilometres and had made insufficient allowance for that uncertainty.
If offered the choice of a revision to their estimates after I have drawn them all on the board, those
that change will usually gravitate to any grouping of the others' estimates or to the estimate of an
individual in the group whose opinion is highly valued. These actions often do not get them closer
to the correct answer. This observation has encouraged me to avoid asking for distribution estimates
during brainstorming sessions (see Section 14.4).
In many cases, people who have given a vast range to their estimates (to howls of laughter from the
others) are the only ones to get it inside their range.
People attending my seminars are almost always computer literate, but it is surprising how many
have little feel for numbers and offer estimates that could not possibly be correct.
Faced with a quantity that seems impossible to quantify at first, the estimator can often arrive at a
reasonable estimate by being encouraged to either break the quantity down into smaller components
or to make a comparison with other quantities. For example, the mass of the Earth could be estimated
by first estimating the average density of rock and then multiplying it by an estimate of the volume
of the Earth (requiring an estimate of its radius or circumference). Occasionally, this method has
come up with some huge errors where the estimator has confused formulae for the volume of a
sphere with area, etc.
Very occasionally, individuals lacking confidence will refuse to read out their opinions to the class.
Sometimes, estimators will provide a set of answers without really understanding the quantity they
are estimating (e.g. not knowing that K2 is a mountain, the second highest in the world). Note
that the person in question did not seek clarification, even after being encouraged to do so. This
"shyness" seems to be much more common in some nationalities than others.
This exercise can legitimately be criticised on several points:
1. The class members are asked to estimate quantities that they have no real knowledge of and their

score is therefore not reflective of their ability to estimate quantities that would be required of them
in their work.
2. In most real-life problems, the quantity being estimated does not have a fixed known value but is
itself uncertain.
3. In real-life problems, if the estimator has provided a range that was small but just missed the true
value, that estimate would still be more useful than another estimate with a much wider range but
that included the true value.
4. In real-life problems, estimators would presumably check formulae and conversion factors that they
were unsure of.
The scores should not be taken very seriously (I don't keep a record of the results). The exercise is
simply a good way to highlight some of the issues concerned in estimating. A more realistic exercise
would be to compare probabilistic estimates from an expert for real problems with the values that were
eventually observed. Of course, such an exercise could take many months or years to complete.

398

Risk Analysis

Class estimating exercise 2

The class is grouped in pairs and asked to give the same three-point estimate, as used for the above
exercise, of the total weight (mass) of the members of the class in kilograms, including myself, and
our total height in metres. While they are estimating, I go round the class and ask each member quietly
for their own measurements. At the end of the exercise, I draw up the estimates as in Figure 14.1 and
superimpose the true value. Then we discuss how each group produced its estimates. The following
points generally come out:
Three estimating techniques are usually used by the class:
1. Produce a three-point estimate of the distributions of height and mass for individuals in the
class and multiply by the number of people in the class. This logic is incorrect since it ignores
the central limit theorem, which states that the spread of the sum of a set of n variables is
proportional to ,hi, not n. It generally manages to encompass the true result but with a very
wide (and therefore inaccurate) range.
2. Produce a three-point estimate of each individual in the class and add up the minima to get the
final-estimate minimum, add up the most likely values to get the final-estimate most likely and
add up the maxima to get the final-estimate maximum. Again, this is incorrect since it ignores
the central limit theorem and therefore produces too wide a range.
3. Produce a three-point estimate of each individual in the class and then run a simulation to add
them up. Take the 5 %, mode and 95 % values of the simulation result as the final three-point
estimate. This generally has the narrowest range but is still quite likely to encompass the true
value.
There is often a dominant person in a pair who takes over the whole estimating, either because that
person is very enthusiastic or more familiar with the software or because the other person is a bit
laid back or quiet. This, of course, loses the value of being in pairs.
The estimators often forget to exclude themselves from their uncertainty estimates. They have given
me their measurements, so they should only assign uncertainty to the others' measurements and then
add their measurements to the total.
If the central limit theorem corrections are applied to the violating estimates, the class scores average
out at about 1.4 compared with the 1.8 it should have been (i.e. 2 * 90 %). In other words, their minimum
to maximum range, which was supposed to have a 90 % probability of including the true value, actually
had about a 70 % probability.

14.2.2 Common heuristic biases and errors
The analyst should bear in mind the following heuristics that the expert may employ when attempting
to provide subjective estimates and that are potential sources of systematic bias and errors. These biases
are explained in considerably more detail in Hertz and Thomas (1983) and in Morgan and Henrion
(1990) (the latter includes a very comprehensive list of references).
Availability

This is where experts use their recollection of past occurrences of an event to provide an estimate. The
accuracy of their estimates is dictated by their ability to remember past occurrences of the event or

Chapter 14 Eliciting from expert opinion

399

how easily they can imagine the event occurring. This may work very well if the event is a regular
part of their life, e.g. how much they spend on petrol. It also works well if the event is something
that sticks in their mind, e.g. the probability of having a flat tyre. On the other hand, it can produce
poor estimates if it is difficult for the experts to remember past occurrences of the event: for example,
they may not be able confidently to estimate the number of people they passed in the street that day
since they would have no interest in noting each passer-by. Availability can produce overestimates of
frequency if the experts can remember past occurrences very clearly because of the impact they had
on them. For example, if a computer manager was asked how often her mainframe had crashed in the
last two years, she might well overestimate the frequency because she could remember every crash and
the crises they caused, but, because of the clarity of her recollection ("it seems like only yesterday"),
include some crashes that happened well over 2 years ago and therefore overestimate the frequency as
a result.
The availability heuristic is also affected by the degree to which we are exposed to information. For
example: one might consider that the chance of dying in a motoring accident was much higher than
dying from stomach cancer, because car crashes are always being reported in the media and stomach
cancer fatalities are not. On the other hand, an older person may have had several acquaintances who
have died from stomach cancer and would therefore offer the reverse opinion.
Representativeness

One type of bias is the erroneous belief that the large-scale nature of uncertainty is reflected in smallscale sampling. For example, in the National Lottery, many would say I had no chance of winning if I
selected the consecutive numbers 16, 17, 18, 19, 20 and 21. The lottery numbers are randomly picked
each week so it is believed that the winning numbers should also exhibit a random pattern, e.g. 3, 11,
15, 21, 29 and 41. Of course, both sets of numbers are actually equally likely.
I once reviewed a paper that noted that, out of 200 houses fitted with a new type of gas supply piping
and tested over a period of a year and a half, one of those houses suffered a gas leak due to a rat
gnawing through the pipe. It concluded that there was a 1:300 chance of a "rodent attack" per house
per year. What should the answer have been?
A second type of representativeness bias is where people concentrate on an enticing detail of the
problem and forget the overall picture. In a frequently cited paper by Kahneman and Tversky, described
in Morgan and Henrion (1990), subjects in an experiment were asked to determine the probability of
a person being an engineer on the basis of a written description of that person. If they were given a
bland description that gave no clue to the person's profession, the answer given was usually 5050,
despite being told beforehand that, of the 100 described people, 70 were lawyers and 30 were engineers.
However, when the subjects were asked what probability they would give if they had no description
of the person, they said 30 %, illustrating that they understood how to use the information but had just
ignored it.
Adjustment and anchoring

This is probably the most important heuristic of the three. Individuals will usually begin their estimate
of the distribution of uncertainty of a variable with a single value (usually the most likely value) and
then make adjustments for its minimum and maximum from that first value. The problem is that these
adjustments are rarely sufficient to encompass the range of values that could actually occur: the estimators
appear to be "anchored" to their first estimated value. This is certainly one source of overconfidence
and can have a dramatic impact on the validity of a risk analysis model.

400

Risk Analysis

14.2.3 O t h e r sources of estimating inaccuracy
There are other elements that may affect the correct assessment of uncertainty, and the analyst should
be aware of them in order to avoid unnecessary errors.
Inexpert expert

The person nominated (wrongly) as being able to provide the most knowledgeable opinion occasionally
actually has very little idea. Rather than refemng the analyst on to another more expert in the problem,
that person may try to provide an opinion "to be helpful", even though that opinion is of little real value.
The analyst, seeing the inexpertness of the interviewee, should seek an alternative opinion, although
this may not be apparent until later.
Culture of the organisation

The environment within which people work may sometimes impact on their estimating. Sales people
will often provide unduly optimistic estimates of future sales because of the optimistic culture within
which they work. Managers may offer high estimates of running costs because, if they achieve a lower
operating cost, their organisation will view them favourably. The analyst should try to be aware of
any potential conflict and seek to eliminate it through cross-checking with data and other people in the
organisation.
Conflicting agendas

Sometimes the expert will have a vested interest in the values that are submitted to a model. In one
model I developed, managers were deliberately providing hugely optimistic growth rate predictions
to me because, in the organisation they worked for, it could aid their individual empire building. In
another, I was offered very optimistic estimates of completion time and costs for a project because, if
that project were given approval, the person in question would become the project's manager with a
big wage increase to match. Lawyers may offer a low estimate of the cost of litigation because, if they
get the brief, they can usually increase the fees later. The analyst must be aware of such conflicting
agendas and seek a second disinterested opinion.
Unwillingness to consider extremes

The expert will frequently find it difficult or be unwilling to envisage circumstances that would cause
a variable to be extremely low or high. The analyst will often have to encourage the development of
such extreme scenarios in order to elicit an opinion that realistically covers the entire possible range.
This can be done by the analyst dreaming up some examples of extreme circumstances and discussing
them with the expert.
Eagerness t o say the right thing

Occasionally, interviewees will be trying to provide the answer they think the analyst wants to hear. For
this reason, it is important not to ask questions that are leading and never to offer a value for the expert
to comment on. For example, if I said "How long do you think this task will take? Twelve weeks?
More? Less?" I could well get an answer nearer to 12 weeks than if I had simply said "How long do
you think this task will take?".

Chapter 14 Eliciting from expert opinion

40 1

Units used in the estimation

People are frequently confused between the magnitudes of units of measurement. An older (or English)
person may be used to thinking of distances in miles and liquid volumes in (UK) gallons and pints. If
the model uses SI units, the analyst should let the experts describe their estimates in the units in which
they are comfortable and convert the figures afterwards.
Expert too busy

People always seem to be busy and under pressure. A risk analyst coming to ask a lot of difficult questions
may not be very welcome. The expert may act brusquely or give the whole process lip service. Obvious
symptoms are when the expert offers oversimplistic estimates like X f Y % or minimum, most likely
and maximum values that are equally spaced for all estimated variables. The solution to such problems
is to get the top management visibly to support the development of the risk model, ensuring that the
employees are given the message that this work is a priority.
Belief that the expert should be quite certain

It may be perceived by experts that assigning a large uncertainty to a parameter would indicate a lack
of knowledge and thereby undermine their reputation. The expert may need to be reassured that this is
not the case. An expert should have a more precise understanding of a parameter's true uncertainty and
may, in fact, appreciate that the uncertainty could be greater than the layperson would have expected.

14.3 Modelling Techniques
This section describes a range of techniques including the role of various types of probability distribution
that are useful in the eliciting of expert opinion. I have only included those techniques that have worked
for me, so the reader will find some omissions when comparing with other risk analysis texts.

14.3.1 Disaggregation
A key technique to eliciting distributions of opinion is to disaggregate the problem sufficiently well so
that experts can concentrate on estimating something that is tangible and easy to envisage. For example,
it will generally be more useful to ask experts to break down their company's revenue into logical
components (like region, product, subsidiary company, etc.) rather than to estimate the total revenue in
one go. Disaggregation allows the expert and analyst to recognise dependencies between components
of the total revenue. It also means that the risk analysis result will be less critically dependent on
the estimate of each model component. Aggregating the estimates of the various revenue components
will show a more complex and accurate distribution than ever could have been achieved by directly
estimating the sum. The aggregation will also take care of the effects of the central limit theorem
automatically - something that is extremely hard for experts to do in their head. Another benefit of
disaggregation is that the logic of the problem usually becomes more apparent and the model therefore
becomes more realistic.
During the disaggregation process, analysts should be aware of where the key uncertainties lie within
their model and therefore where they should place their emphasis. The analyst can check whether an
appropriate level of disaggregation has been achieved by running a sensitivity analysis on the model (see
Section 5.3.7) and looking to see whether the Tornado chart is dominated by one or two model inputs.

402

Risk Analys~s

14.3.2 Distributions used in modelling expert opinion
This section describes the role of various types of probability distribution in modelling expert opinion.
Non-parametric and parametric distributions

Probability distribution functions fall into two categories: non-parametric and parametric distributions,
the meanings of which are discussed in detail in Appendix 111.3. A parametric distribution is based
on a mathematical function whose shape and range is determined by one or more distribution parameters. These parameters often have little obvious or intuitive relationship to the distribution shapes they
define. Examples of parametric distributions are: lognormal, normal, beta, Weibull, Pareto, loglogistic,
hypergeometric - most distribution types, in fact.
Non-parametric distributions, on the other hand, have their shape and range determined by their
parameters directly in an obvious and intuitive way. Their distribution function is simply a mathematical
description of their shape. Non-parametric distributions are: uniform, relative, triangular, cumulative and
discrete.
As a rule, non-parametric distributions are far more reliable and flexible for modelling expert opinion
about a model parameter. The questions that the analyst poses to the expert to determine the distribution's parameters are intuitive and easy to respond to. Changes to these parameters also produce an
easily predicted change in the distribution's shape and range. The application of each non-parametric
distribution type to modelling expert opinion is discussed below.
There are three common exceptions to the above preference for using non-parametric distributions to
model expert opinion:

2. The PERT distribution is frequently used to model an expert's opinion. Although it is, strictly
speaking, a parametric distribution, it has been adapted so that the expert need only provide estimates
of the minimum, most likely and maximum values for the variable, and the PERT function finds a
shape that fits these restrictions. The PERT distribution is explained more fully below.
3. The expert may occasionally be very familiar with using the parameters that define the particular
distribution. For example, a toxicologist may regularly determine the mean standard error of a
chemical concentration in a set of samples. It might be quite helpful to ask the expert for the mean
and standard deviation of hislher uncertainty about some concentration in this case.
4. The parameters of a parametric distribution are sometimes intuitive and the analyst can therefore
ask for their estimation directly. For example, a binomial distribution is defined by n, the number
of trials that will be conducted, and p, the probability of success of each trial. In cases where I
consider the binomial distribution to be the most appropriate, I generally ask the expert for estimates
of n and p, recognising that I will have to insert them into a binomial distribution, but I would
try to avoid any discussion of the binomial distribution that might cause confusion. Note that the
estimates of n and p can also be distributions themselves.
There are other problems associated with using parametric distributions for modelling expert opinion:
A model that includes parametric distributions to represent opinion is more difficult to review later
because the parameters of the distribution may have no intuitive appeal.
It is very difficult to get the precise shape right when using parametric distributions to model expert
opinion as the effects of changes in the parameters are not usually obvious.

Chapter 14 Eliciting from expert opinion

403

Figure 14.4 Examples of triangular distributions.

The triangular d~stribution

The triangular distribution is the most commonly used distribution for modelling expert opinion. It is
defined by its minimum (a), most likely (b) and maximum (c) values. Figure 14.4 shows three triangular
distributions: Triangle(0, 10, 20), Triangle(0, 10, 50) and Triangle(0, 50, 50), which are symmetric, right
skewed and left skewed respectively. The triangular distribution has a very obvious appeal because it
is so easy to think about the three defining parameters and to envisage the effect of any changes.
The mean and standard deviation of the triangular distribution are determined from its three parameters:
Mean =
Standard deviation =

(a

+ b + c)
3
a2

+ b2 + c2 - a b - a c - be)

From these formulae it can be seen that the mean and standard deviation are equally sensitive to all
three parameters. Many models involve parameters for which it is fairly easy to estimate the minimum
and most likely values, but for which the maximum is almost unbounded and could be enormous.
The central limit theorem tells us that, when adding up a large number of distributions (for example,
adding costs or task durations), it is the distributions' means and standard deviations that are most
important because they determine the mean and standard deviation of the risk analysis result. In situations
where the maximum is so difficult to determine, the triangular distribution is not usually appropriate
since it will depend a great deal on how the estimation of the maximum is approached. For example, if
the maximum is assumed to be the absolutely largest possible value, the risk analysis output will have
a far larger mean and standard deviation than if the maximum is assumed to be a "practical" maximum
by the estimating experts.
The triangular distribution is often considered to be appropriate where little is known about the
parameter outside an approximate estimate of its minimum, most likely and maximum values. On the
other hand, its sharp, very localised peak and straight lines produce a very definite and unusual (and
very unnatural) shape, which conflicts with the assumption of little knowledge of the parameter.

404

Risk Analysis

Figure 14.5 Example of a Trigen distribution.

There is another useful variation of the triangular distribution, called Trigen in @RISK and TriangGen
in Risk Solver, for example. The Trigen distribution requires five parameters: Trigen(a, b, c , p, q), which
have the following meanings:
a : the practical minimum
b : the most likely value
c : the practical maximum value
p : the probability that the parameter value could be below a

q : the probability that the parameter value could be below c

Figure 14.5 shows a Trigen(40, 50, 80, 5 %, 95 %) distribution, with the 5 % areas extending beyond
the minimum and maximum (40 and 80 here). The Trigen distribution is a useful way of avoiding
asking experts for their estimate of the absolute minimum and maximum of a parameter: questions that
experts often have difficulty in answering meaningfully since there may theoretically be no minimum
or maximum. Instead, the analyst can discuss what values of p and q the experts would use to define
"practical" minima and maxima respectively. Once this has been decided, the experts only have to give
their estimates for practical minimum, most likely and practical maximum for each estimated parameter,
and the same p and q estimates are used for all their estimates. One drawback is that the expert may
not appreciate the final range to which the distribution may extend, so it is wise to plot the distribution
and have it agreed by the expert before using it in the model.
The Tri1090 distribution, featured in @RISK, presumes that p and q are 10 and 90 % respectively,
which is generally about right, but I prefer to use the Trigen because it adapts to each expert's concept
of "practical".
The uniform distribution

The uniform distribution is generally a very poor modeller of expert opinion since all values within its
range have equal probability density, but that density falls sharply to zero at the minimum and maximum
in an unnatural way. The uniform distribution obeys the maximum entropy formalism (see Section 9.4)
where only the minimum and maximum are known, but in my experience it is rare indeed that the

Chapter 14 Eliciting from expert opinion

405

expert will be able to define the minimum and maximum but have no opinion to offer on a most likely
value.
The uniform distribution does, however, have several uses:
to highlight or exaggerate the fact that little is known about the parameter;
to model circular variables (like the direction of wind from 0 to 2n) and other specific problems;
to produce spider sensitivity plots (see Section 5.3.8).
The PERT distribution

The PERT distribution gets its name because it uses the same assumption about the mean (see below)
as PERT networks (used in the past for project planning). It is a version of the beta distribution and
requires the same three parameters as the triangular distribution, namely minimum (a), most likely (b)
and maximum (c). Figure 14.6 shows three PERT distributions whose shape can be compared with
the triangular distributions of Figure 14.4. The equation of a PERT distribution is related to the beta
distribution as follows:

where

The mean
The last equation for the mean is a restriction that is assumed in order to be able to determine values
for a1 and a2. It also shows how the mean for the PERT distribution is 4 times more sensitive to

Figure 14.6 Examples of PERT distributions.

406

Risk Analysis

C

cn 0.14

average = 0.174

Most likely value

Figure 14.7 Comparison of the standard deviation of Triangle(0, most likely, 1) and PERT(0, most likely, 1)

distributions.
the most likely value than to the minimum and maximum values. This should be compared with the
triangular distribution where the mean is equally sensitive to each parameter. The PERT distribution
therefore does not suffer to the same extent the potential systematic bias problems of the triangular
distribution, that is, in producing too great a value for the mean of the risk analysis results where the
maximum for the distribution is very large.
The standard deviation of a PERT distribution is also less sensitive to the estimate of the extremes.
Although the equation for the PERT standard deviation is rather complex, the point can be illustrated
very well graphically. Figure 14.7 compares the standard deviations of triangular and PERT distributions
that have the same a, b and c values. To illustrate the point, the figure uses values of 0 and 1 for a
and c respectively and allows b to vary between 0 and 1, although the observed pattern extends to any
(a, b, c} set of values. You can see that the PERT distribution produces a systematically lower standard
deviation than the triangular distribution, particularly where the distribution is highly skewed (i.e. b is
close to 0 or 1 in this case). As a general rough rule of thumb, cost and duration distributions for project
tasks often have a ratio of about 2:l between the (maximum-most likely) and (most likely-minimum),
equivalent to b = 0.3333 in Figure 14.7. The standard deviation of the PERT distribution at this point is
about 88 % of that for the triangular distribution. This implies that using PERT distributions throughout
a cost or schedule model, or any other additive model, will display about 10 % less uncertainty than the
equivalent model using triangular distributions.
Some readers would perhaps argue that the increased uncertainty that occurs with triangular distributions will compensate to some degree for the "overconfidence" that is often apparent in subjective
estimating. The argument is quite appealing at first sight but is not conducive to the long-term improvement of the organisation's ability to estimate. I would rather see an expert's opinion modelled as precisely
as is practical. Then, if the expert is consistently overconfident, this will become apparent with time
and hislher estimating can be corrected.
The modified PERT distribution

The PERT distribution can also be manipulated to produce shapes with varying degrees of uncertainty
for the same minimum, most likely and maximum by changing the assumption about the mean:
The mean ( p ) =

a+y *b+c

Y+2

Chapter 14 Eliciting from expert opinion

407

Figure 14.8 Examples of modified PERT distributions with varying most likely weighting y .

+ +

In the standard PERT, y = 4, which is the PERT network assumption that p = (a 4b c)/6. However,
if we increase the value of y , the distribution becomes progressively more peaked and concentrated
around b (and therefore less uncertain). Conversely, if we decrease y, the distribution becomes flatter and
more uncertain. Figure 14.8 illustrates the effect of three different values of y for a modified PERT(5,
7, 10) distribution. This modified PERT distribution can be very useful in modelling expert opinion. The
expert is asked to estimate the same three values as before (i.e. minimum, most likely and maximum).
Then a set of modified PERT distributions is plotted and the expert is asked to select the shape that fits
hisfher opinion most accurately. It is a fairly simple matter to set up a spreadsheet program that will do
all this automatically.
The relative distribution

The relative distribution (also called the general in @RISK, and a version of the Custom in Crystal Ball)
is the most flexible of all of the continuous distribution functions. It enables the analyst and expert to
tailor the shape of the distribution to reflect, as closely as possible, the opinion of the expert. The relative
distribution has the form Relative(minimum, maximum{xi], {pi]), where {xi] is an array of x values
with probability densities ( p i ] and where the distribution falls between the minimum and maximum.
The {pi] values are not constrained to give an area under the curve of 1, since the software recalibrates
the probability scale. Figure 14.9 shows a Relative(4, 15, {7,9, 111, {2,3, 0.5)).
The cumulative distribution

The cumulative distribution has the form CumulativeA(minimum, maximum{xi), {Pi]), where {xi) is an
array of x values with cumulative probabilities {Pi] and where the distribution falls between the minimum
and maximum. Figure 14.10 shows the distribution CumulativeA(0, 10, (1, 4, 61, (0.1, 0.6, 0.8)) as it is
defined in its cumulative form and how it looks as a relative frequency plot. The cumulative distribution
is used in some texts to model expert opinion. However, I have found it largely unsatisfactory because
of the insensitivity of its probability scale. A small change in the shape of the cumulative distribution
that would pass unnoticed produces a radical change in the corresponding relative frequency plot that
would not be acceptable. Figure 14.11 provides an illustration: a smooth and natural relative frequency
plot (A) is converted to a cumulative frequency plot (B) and then altered slightly (C). Converting back
to a relative frequency plot (D) shows that the modified distribution is dramatically different to the
original, although this would almost certainly not have been appreciated by comparing the cumulative

Figure 14.9 Example of a relative distribution.

Cumulative frequency

1-

0.2 -

0.8 --

Relative frequency

0.15 -.
0.1
0.05 -.

0

2

4

6

8

10

0c
0

+
0

2

4

6

8

10

Figure 14.10 Example of a cumulative distribution and its relative frequency plot.

frequency plots. For this reason, I usually prefer to model expert opinion looking at the relative frequency
distribution instead.
One circumstance where the cumulative distribution is very useful is in attempting to estimate a
variable whose range covers several orders of magnitude. For example, the number of bacteria in 1 kg
of meat will increase exponentially with time. The meat may contain 100 units of bacteria or 1 million.
In such circumstances, it is fruitless to attempt to use a relative distribution directly. This point is
discussed more fully in Section 14.3.3.
The discrete distribution

The discrete distribution has the form Discrete({xi), {pi)), where {xi) is an array of the possible values
of the variable with probability weightings {pi}. The {pi} values do not have to add up to unity as
the software will normalise them automatically. It is actually often useful just to consider the ratio of
likelihood of the different values and not to worry about the actual probability values. The discrete

Chapter 14 Ellciting from expert opinion

409

Figure 14.11 Example of how small changes in a distribution's cumulative plot can dramatically affect its
shape.

distribution can be used to model a discrete parameter (that is, a parameter that may take one of two or
more distinct values), e.g. the number of turbines that will be used in a power station, and to combine
two or more conflicting expert opinions (see Section 14.3.4).

14.3.3 Modelling opinion of a variable that covers several
orders of magnitude
A continuous parameter whose uncertainty extends over several orders of magnitude generally cannot
be modelled in the usual manner. For example, an expert may consider that 1 g of meat could contain
any number of units of bacteria from 1 to 10 000 but that this figure is just as likely to be around 100
or 1000. If we were to model this estimate using a Uniform(1, 10 000) distribution, for example, we
would almost certainly not match the expert's opinion of the values of the cumulative percentiles. The
expert would probably place the 25, 50 and 75 percentiles at about 10, 100 and 1000, where our model
places them at 2500, 5000 and 7500 respectively. The reason for such a large discrepancy is that the
expert is subconsciously making hisher estimate in log-space, i.e. s h e is thinking of the loglo values:
loglol = 0, loglo10 = 1, loglolOO = 2, etc. To match the expert's approach to estimating, the analyst
can also work in log-space, so the distribution becomes
Number of units of bacteria = 10U"if0m(0~4)
Figure 14.12 compares these two interpretations of the expert opinion by looking at the cumulative
distributions and statistics they would produce. The Uniform(1, 10 000) has much larger mean and
standard deviation than the 10U"if0m(0~4)
distribution and an entirely different shape.

4 10

R~skAnalys~s

Mean =
Std Deviation =
Skewness =
Kurtosis =

Uniform(1,10000) 1OAUniform(0,4)
5000.5
1085
2886
2062
0
2.4
1.8
5.2

-

Figure 14.12 Comparison of two ways to model expert opinion of a variable that covers several orders of
magnitude.

If the expert had said instead that there could be between 1 and 10 000 units of bacteria in 1 g
of meat, but the most likely number is around 500, we would probably have the greatest success in
modelling this variable as
Number of units of bacteria = 10PERT(0,2.794)
where 1 0 g ~ ~ 5 0=02.7.
If the variable is to be modelled as a 10" type formula described above, it is judicious to compare
the cumulative percentiles at a few sensible points with those the expert would expect. Any radical
differences would suggest that the expert is not actually thinking in log-space and the cumulative
distribution could be used instead.

14.3.4 Incorporating differences in expert opinions
Experts will sometimes produce profoundly different probability distribution estimates of a parameter.
This is usually because the experts have estimated different things, made differing assumptions or have
different sets of information on which to base their opinion. However, occasionally two or more experts
simply genuinely disagree. How should the analyst approach the problem? The first step is usually to
confer with someone more senior and find out whether one expert is preferred over the other. If those
more senior have some confidence in both opinions, a method is needed to combine these opinions in
some way.
Recommended approach

I have used the following method for a number of years with good results. Use a Discrete((xi}, (pi})
distribution where the {xi}are the expert opinions and the {pi] are the weights given to each opinion
according to the emphasis one wishes to place on them. Figure 14.13 illustrates an example combining

Chapter 14 El~c~t~ng
from expert oplnlon

Expert A's opinion

40

50

60

70

411

Expert B's opinion

80 \90

Expert C's opinion

Figure 14.13 Combining three dissimilar expert opinions.

three differing opinions, but where expert A is given twice the emphasis of the others owing to the
greater experience of that expert.
Two incorrect approaches are frequently used:
Pick the most pessimistic estimate. This is generally unsatisfactory, as a risk analysis model should
be attempting to produce an unbiased estimate of the uncertainty. The caution should only be applied
at the decision-making stage after reviewing the risk analysis results.
Take the average of the two distributions. This is incorrect as the resultant distribution will be too
narrow. By way of illustration, consider the test situation where both experts believed a parameter
should be modelled by a Normal(100, 10) distribution. Whatever technique was used to combine their
opinions, the result should be the same Normal(100, 10) distribution. The average of these two distributions, i.e. AVERAGE(Normal(100, lo), Normal(100, lo)), would be a Normal(100, lo/&) =
Norma1(100,7.07) from the central limit theorem. In other words, we would have produced far too
small a spread.

1 have been offered suggestions for other approaches to this problem:
Take the weighted average of the relative or cumulative percentiles. This will correctly construct
the combined distribution (it is how the ModelRisk function VoseCombined works) but it is very
laborious to execute for all but the most simple distributions of opinion unless you have a library
of density and cdf functions, so it is somewhat impractical to start from scratch.
Multiply together the probability densities at each x value. This is incorrect because (a) it produces
combined distributions with exaggerated peakedness, (b) the area under the curve is no longer 1 and
(c) the combined distribution is contained between the highest minimum and the lowest maximum.

4 12

R~skAnalys~s

A

1
2
3
4
5
6
7
8

9

1

B

SME
Peter
Jane
Paul
Susan

I

c

Min
11
12
8
9

l

D

Mode
13
13
10
10

Combined estimate
P(>14)

l

E

Max
17
16
13
15

F

l

I

G

IH

Distribution
Weight
VosePERT($C$3,$D$3,$E$3)
0.3
VosePERT($C$4,$D$4,$E$4)
0.2
VosePERT($C$5,$D$5,$E$5)
0.4
VosePERT($C$G,$D$6,$E$6)
0.1

8.680244
0.878805

10
11
12
13
14
15

Figure 14.14

Formulae table
F3:F6
=VosePERTObject(C3,D3,E3)
E8 (output) =VoseCombined(F3:F6,G3:G6,B3:B6)
E9 (output) =VoseCombinedProb(l4,F3:F6,G3:G6,B3:B6,1)

Combining weighted SME estimates using VoseCombined functions.

ModelRisk has the function VoseCombined({Distributions}, {Weights)) and related probability calculation functions that perform the combination described above. In the model in Figure 14.14, four
expert estimates are combined to construct the one estimate. The advantage of this function is it
then allows one to perform a sensitivity analysis on the estimate as a whole: if you were to use
the Discrete({Distributions), {Weights}) method, your Monte Carlo software would, in this case, be
performing a sensitivity analysis of five distributions: the four estimates and the discrete distribution,
which will dilute the perceived influence of the combined uncertainty.
In the model in Figure 14.4, the VoseCombined function generates random values from a distribution
constructed by weighting the four SME estimates. The weights do not need to sum to 1: they will be
normalised. The VoseCombinedProb(. . ., 1) function calculates the probability that this distribution will
take a value less than 14. Note that the names of the experts is an optional parameter: this simply
records who said what and has no effect on the calculation, but select cell E8 and then click the Vf
(View Function) icon from the ModelRisk toolbar and you will get the graph shown in Figure 14.15,
which allows us to compare each SME's estimate and see how they are weighted.

14.4 Calibrating Subject Matter Experts
When subject matter experts (SMEs) are first asked to provide probabilistic estimates, they usually won't
be particularly good at it because it is a new way of thinking. We need some techniques that allow us
to help the SMEs gauge how well they are estimating and, over time, correct any biases they have. We
may also need a method for selecting between or weighting SMEs estimates.
Imagine that an SME has estimated that a bespoke generator being placed on a ship will cost
$PERT(1.2, 1.35, 1.9)million, and we compare the actual outturn costs against that estimate. Let's
say it ended up costing $1.83 million. Did the SME provide a good estimate? Well, it fell within the
range provided, which is a good start, but it was at the high end, as Figure 14.16 shows.
The 1.83 value fell at the 99.97th percentile of the PERT distribution. That seems rather high considering the SME's estimate lay from 1.2 to 1.9 and 1.83 is only 90 % along that range, but it is the result
of how the PERT distribution interprets the minimum, mode and maximum values. The distribution is

Chapter 14 Eliciting from expert op~nion 4 1 3

Figure 14.15 Screen capture of graphic interface for the VoseCombined function used in the model of
Figure 14.14.

Figure 14.16 An SME estimate.

quite right skewed, in which case the PERT has a thin right tail - in fact it assigns only a 1 % probability
to values larger than 1.73.
For this exercise, however, we'll assume that the SME had seen the plots above and was comfortable
with the estimate. We can't be certain with just one data point that the SME tends to underestimate.
In areas like engineering, capital investment and project planning, one SME will often provide many

4 14

Risk Analysis

estimates over time, so let's imagine we repeat the exercise some 10 times and determine the percentile
at which each outturn cost lies on each corresponding distribution estimate. In theory, if our SME was
perfectly calibrated, these would be random samples from a Uniform(0, 1) distribution, so the mean
should rapidly approach 0.5. A Uniform(0, 1) distribution has a variance of 1/12, so the mean of 10
samples from a perfectly calibrated SME should, from central limit theorem, fall on a Normal(O.5,
llSQRT(12 * 10)) = Normal(O.5, 0.091287). If the 10 values average to 0.7, we can be pretty sure
that the SME is overestimating, since there is only a (1 - NORMDIST(0.7,0.5,0.091287,
1)) = 1.4 %
chance a perfectly calibrated SME would have produced a value of 0.7 or larger. Similarly, we can
analyse the variance of the 10 values. It should be close to 1/12: if the variance is smaller then the SME's
distributions are too wide, or, as is more likely, if the variance is larger then the SME's distributions
are too narrow. The above analysis assumes, of course, that all the estimates actually fell within the
SME's distribution range, which may well not be the case. The plots in Figure 14.17 can help provide
a more comprehensive picture.
Experts are also sometimes asked to estimate the probability that an event will occur, which is no
easy task. In theory one can roughly estimate how good a SME is at providing these estimates by
grouping estimated probabilities into bands (e.g. the same bands as in Figure 14.17) and determining
what fraction of those risk events actually occurred. Obviously, around 15 % of risks that were thought
to have between 10 % and 20 % chance of occurring should actually occur. However, this breaks down
at the lowest and highest categories because many identified potential risks are perceived to have a very
small probability of occurrence, so we will almost never actually have any observations.

14.5 Conducting a Brainstorming Session
When the initial structure of the problem has been decided and subjective estimates of the key uncertainties are now required, it is often very useful to conduct one or more brainstorming sessions with several
experts in the area of the problem being analysed. If the model covers several different disciplines,
for example engineering, production, marketing and finance, it may be better to hold a brainstorming
session for each discipline group as well as one for everybody.
The objectives of the brainstorming session are to ensure that everyone has the same information
pertinent to the problem and then to debate the uncertainties of the problem. In some risk analysis texts,
the analyst is encouraged to determine a distribution of each uncertain parameter during these meetings. I
have tried this approach and find it very difficult to do well because it relies very heavily on controlling
the group's dynamics: ensuring that the loudest voice does not get all the air time; encouraging the
individuals to express their own opinion rather than following the leader, etc. These meetings can also
end up dragging on, and some of the experts may have to leave before the end of the session, reducing
its effectiveness.
My aim in brainstorming sessions is to ensure that all those attending leave with a common perception
of the risks and uncertainties of the problem. This is achieved by doing the following:
Gathering all relevant information and circulating it to the attending experts prior to the meeting.
Presenting data in easily digested forms, e.g. using scatter plots, trend charts, statistics and histograms
wherever possible rather than columns of figures.
At the meeting, encouraging discussion of the variability and uncertainty in the problem, including
the logical structure and any correlations. Discussing scenarios that would produce extreme values

Chapter 14 Eliciting from expert opinion

4 15

Expert A

0

0

0

0

0

0

0

0

Expert C

30%

-

6
a,

u 20%

-

LL

-

a,

.-

a,
LT

-a

- - - --- - - - - - .
-

-

10%

--------.
-

-

E

V)

z 2 - - - - - - - - -

-

9

c
9
0

2

u9 m9 q 9l n9 s 9k 9
c s9 s - 9O

f - ! " ,' Z, %
,k

0,

," , 2 2 2 - 2

Figure 14.17 Histogram of SME outturn percentiles. Percentiles are grouped into 10 bands so roughly
10 % of the percentile scores should lie in each band (when there are a lot of scores). Expert A is well
calibrated. Expert B provides estimates that are too narrow and tends to underestimate. Expert C provides
estimates that are far too wide and tends to overestimate.

4 16

Risk Analysis

for the uncertain variables to get a feel for the true extent of the total uncertainty. Some of the
experts may also have extra information to add to the pot of knowledge.
The analyst, acting as chairperson, ensuring that the discussion is structured.
Taking minutes of the meeting and circulating them afterwards to the attendees.
After a suitable, but short, period for contemplation following the brainstorming session, the analyst
conducts individual interviews with each expert and attempts to determine their opinions of the uncertainty of each variable that was discussed. The techniques for eliciting these opinions are discussed
in Section 14.6.1. Since all the experts will have the same level of knowledge, they should produce
similar estimates of uncertainty. Where there are large differences between opinions, the experts can be
reconvened to discuss the issue. If no agreement can be reached, the conflicting opinions can be treated
as described in Section 14.3.4.
I believe that this procedure has several distinct benefits over attempting to determine distributions
during brainstorming sessions:
a

Each expert has been given the time to think about the problem.
They are encouraged to develop their own opinion after the benefit of discussion with the other
experts.
A quiet individual is given as much prominence as a dominating one.
Differences in opinion between experts are easier to identify.
The whole process can be conducted in a much more orderly fashion.

14.6 Conducting the Interview
Initial resistance

Expert opinion of the uncertainty of a parameter is generally determined in a one-to-one interview
between the relevant expert and the analyst developing the model. In preparing for such interviews,
analysts should make themselves familiar with the various techniques for modelling expert opinion
described earlier in this chapter. They should also be familiar with the various sources of biases and
errors involved in subjective estimation. The experts, in their turn, having been informed of the interviews
well in advance, should have evaluated any relevant information either on their own or in a brainstorming
session described above.
There is occasionally some initial resistance by the experts in providing estimates in the form of
distributions, particularly if they have not been through the process before. This may be because they
are unfamiliar with probability theory. Alternatively, they may feel they know so little about the variable
(perhaps because it is so uncertain) that they would find it hard enough to give a single point estimate
let alone a whole probability distribution.
I like to start by explaining how, by using uncertainty distributions, we are allowing the experts to
express their lack of certainty. I explain that providing a distribution of the uncertainty of a parameter
does not require any great knowledge of probability theory. Neither does it demand a greater knowledge
of the parameter itself than a single-point estimate - quite the reverse. It gives the experts a means to
express their lack of exact knowledge of the parameter. Where in the past their single-point estimates
were always doomed never to occur precisely, their estimates now using distributions will be correct if
the actual value falls anywhere within the distribution's range.

Chapter 14 Eliciting from expert opinion

4 17

The next step is to discuss the nature of the parameter's uncertainty. I prefer to let the experts explain
how they see the logic of the uncertainty rather than impose on them a structure I may have had in
mind and then to model what I hear.
Opportunity t o revise estimates

Experts are usually more comfortable about providing estimates if they are told before the interviews
that they have the opportunity to revise their estimates at a later date. It is also good practice to leave
the experts with a printed copy of each estimate and to get them to sign a copy for the analyst's records.
Note that the copy should have a date on it. This is important since the experts' opinion could change
dramatically after the occurrence of some event or the acquisition of more data.

14.6.1 Eliciting distributions of the expert opinion
Once the model has been sufficiently disaggregated, it is usually not necessary to provide very precise
estimates of each individual component of the model. In fact, three-point estimates are usually quite
sufficient, the three points being the minimum, most likely and maximum values the expert believes the
value could take. These three values can be used to define either a triangular distribution or some form
of PERT distribution. My preference is to use a modified PERT, as described in Section 14.3.2, because
it has a natural shape that will invariably match the expert's view better than a triangular distribution
would. The analyst should attempt to determine the expert's opinion of the maximum value first and
then the minimum, by considering scenarios that could produce such extremes. Then, the expert should
be asked for hisfher opinion of the most likely value within that range. Determining the parameters in
the order (1) maximum, (2) minimum and (3) most likely will go some way to removing the "anchoring"
error described in Section 14.2.2.
Occasionally, a model will not disaggregate evenly into sufficiently small components, leaving the
model's outputs strongly affected by one or more individual subjective estimates. When this is the case,
it is useful to employ a more rigorous approach to eliciting an expert's opinion than a simple three-point
estimate. In such cases, the modified PERT distribution is a good start but, on review of the plotted
distribution, the expert might still want to modify the shape a little. This can be done with pen and
graph paper as shown in Figure 14.18. In this example, the marketing manager believes that the amount
of wool her company will sell next month will be at least 5 metric tons (mt), no more than 10 mt and
most probably about 7 mt. These figures are then used to define a PERT distribution that is printed

Figure 14.18 Graphing distribution of expert opinion.

4 18

Risk Analysis

out onto graph paper. On reflection, the manager decides that there is a little too much emphasis being
placed on the right tail and draws out a more realistic shape. The revised curve can then be converted to
a relative distribution and entered into the model. Crosses are placed at strategic points along the curve
so that drawing straight lines between these crosses will produce a reasonable approximation of the
distribution. Then the x- and y-axis values are read off for each point and noted. Finally, the manager
is asked to sign and date the figure for the records.
The above technique is flexible, quite accurate and reassuringly transparent to the expert being questioned. This technique can now also be done without the need for pen and paper, using RISKview
software.
Figure 14.19 illustrates the same example using RISKview. The PERT(5, 7, 10) distribution (top
panel) is moved to the Distribution Artist facility of RISKview and automatically converted into a
relative distribution (bottom panel) with a user-defined number of points (10 in this example). This
distribution can now be modified to better reflect the expert's opinion by sliding the points up and down.
The modified distribution can also immediately be viewed as an ascending or descending cumulative
frequency plot to allow the expert to see if the cumulative percentiles also make sense. When the final
distribution has been settled on, it can be directly inserted into a spreadsheet model at the click of an icon.

14.6.2 Subjective estimation of discrete probabilities
Experts will sometimes be called upon to provide an estimate of the probability of occurrence of a
discrete event. This is a difficult task for experts. It requires that they have some feel for probabilities
that is both difficult for them to acquire and to calibrate. If the discrete event in question has occurred
in the past, the analyst can assist by presenting the data and a beta distribution of the probabilities
possible from that data (see Section 8.2.3). The experts can then give their opinion based on the amount
of information available.
However, it is quite usual that past information has no relevance to the problem at hand. For example,
political analysts cannot look to past general election results for guidance in estimating whether the
Labour Party will win the next general election. They will have to rely on their gut feeling based on
their understanding of the current political climate. In effect, they will be asked to pick a probability out
of the air - a daunting task, complicated by the difficulty of having to visualise the difference between,
say, 60 and 70 %. A possible way to avoid this problem is to offer experts a list of probability phrases,
for example:

a

almost certain;
very likely;
highly likely;
reasonably likely;
fairly likely;
even chance;
fairly unlikely;
reasonably unlikely;
highly unlikely;
very unlikely;
almost impossible.

1

Chapter 14 El~c~t~ng
from expert oplnlon

4 19

420

Risk Analysis

Figure 14.20 Visual aid for estimating probabilities: A = 1 %, B = 5 %, C = 10 %, D = 20 %, E = 30 %,
F=40%,G=50%,H=60%,I=70%,J=80%,K=90%,L=95%,M=99%.

The phrases are ranked in order and the experts are told of this ranking. They are then asked to select
a phrase that best fits their understanding of the probability of each event that has to be considered.
At the end of the session, they are also asked to match as many of the phrases as possible to visual
representations of probability. For example, matching a phrase to the probability of picking out a black
ball at random from the trays of Figure 14.20. Since we know the percentage of black balls in each
tray, we can associate a probability with each phrase and thus with each estimated event.

14.6.3 Subjective estimation of very low and very high probabilities
Risk analysis models occasionally incorporate very unlikely events, i.e. those with a very low probability
of occurrence. It is recommended that readers review Section 4.5.1 before deciding to incorporate rare
events into their model. The risk of the rare event is usually modelled as the probability of its occurrence
combined with a distribution of its impact should it occur. An example might be the risk of a large
earthquake on a chemical plant. The distribution of impact on the chemical plant (in terms of damage
and lost production, etc.) can be reasonably estimated since there is a basis on which to make the
estimation (the components most at risk in an earthquake, the cost of replacement, the time required to
effect the repairs, production rates, etc.). However, the probability of an earthquake is far less easy to
estimate. Since it is so rare, there will be very few recorded occurrences on which to base the estimate
of probability.
When data are not available to determine estimates of probability of very unlikely events, experts will
often be consulted for their opinion. Such consultation is fraught with difficulties. Experts, like the rest
of us, are very unlikely to have any feel whatsoever for low probabilities unless there is a reasonable
amount of data on which to base their estimates (in which case they can offer their opinion based around
the frequency of observed occurrences). The best that the experts can do is to make some comparison
with the frequency of other low-probability events whose probabilities are well defined. Figure 14.21

Chapter 14 Ellciting from expert opinion

42 1

Annual risk of dying in the US
(number of deaths per 1 000 000)

1 000 000

100000

10 000

1000

100

10

1

0.1

0.01

-----,

-

80-year-olddying before age 81

60 000

3000
2800
2000

Amateur p~lot
Heart disease
All cancers; Parachutist

800
590
480

Fire fighter; Hang glider
Lung cancer
Stomach organ cancer

320
220
160
120
80

Pneumonia
Diabetes; Police officer
Motor vehicle accidents; Breast cancer
Suicide
Homicide

50

Falls

30
20
15

Accidental poisoning (drugs and medication)
Pedestrian killed by automobile
Drowning; Fires and burns

5

Firearms: Tuberculosis

2

Electric current; Railway accident

-----+

-

-1

0.6

Airplane crash or accident

0.4

Floods

0.2

Lightning; Insect bite or sting

0.06

Hit on ground by falling airplane

0.04

Hurricane

I

Figure 14.21 Illustration of a risk ladder (for the USA) to aid in expert elicitation (from Williams, 1999, with
the author's permission).

I/
I3

f

I

offers a list of well-determined low probabilities in a graphical format that the reader may find helpful
in this regard.
This inaccuracy in estimating the probability of a rare event will have a very large impact on a risk
analysis. Consider two experts estimating the expected cost of the risk of a gas turbine failing. They
agree that it would cost the company about £600 000 f £200 000 should it fail. However, the first
expert estimates the probability of the event as 1:1000/year and the second as 1:5000/year. Both see the
probability as very low, but the expected cost for the first estimate is 5 times that of the second, i.e.
£600 000 * 1/1000 = £600 compared with 600 000 * 1/5000 = £120.
An estimate of the probability of a rare event can sometimes be broken down into a series of
consecutive events that are easier to determine. For example, the failure of the cooling system of a
nuclear power plant would require a number of safety mechanisms all to fail at the same time. The
probability of failure of the cooling system is then the product of the probability of failure of each safety
mechanism, each of which is usually easier to estimate than the total probability of the event. As another
example, this technique is also enjoying increasing popularity in epidemiology for the assessment of the
risks of introducing exotic diseases to a country through imports. The imported entity (animal, vegetable
or product of either) must first have the disease. Then it must slip through any quality checks in its
country of origin. After that, it must still slip through quarantine checks in the importing country, and
finally it has to infect a potential host. Each step has a probability (which may often be broken down
even further) which is estimated, and these probabilities are then multiplied together to determine the
final probability of the introduction of the disease from one animal.

Chapter I 5

l est~ngand modell~ngcausal
relationships
Testing and modelling causal relationships is the subject of plenty of books. I recommend Pearl (2000),
Neapolitan (2004) and Shipley (2002) because they are thorough, fairly readable if you're good at
mathematics and take a practical viewpoint. The technical details of causal inference lie very firmly in
the domain of statistics, so I'll leave it to these books to explain them. In this chapter I want to look at
some practical issues of causality from a risk analysis perspective. The main impetus to including this
as a separate topic is to help you avoid some of the nonsense that I have come across over the years
while reviewing models and scientific papers, battling in court as an expert witness or just watching
the news on TV. There are a few simple, very practical and intuitive rules that will help you test a
hypothesised causal relationship.
Causal inference is mostly applied to health issues, although the thinking has potential applications
in other areas such as econometrics (in his book, Pearl laments the lack of rigorous causal thinking in
current econometric practices), so I am going to use health issues as examples in this chapter. We can
attempt to use a causal model to answer three different types of question:
Predictions - what will happen given a certain set of conditions?
Interventions - what would be the effect of controlling one or more conditions?
Counte~actuals- what would have happened differently if one or more conditions had been
different?

In a deterministic (non-random) world there is a straightforward interpretation to causality. CSI Miami
and derivatives, and all those medical dramas, are such fun programmes because we viewers try to figure
out what really happened - what caused this week's murder(s), and of course the programme always
finishes with a satisfyingly unequivocal solution. I was once stranded in a US airport hotel in which a
real-world CSI conference was taking place, and they were keen to tell me how their reality was rather
different. They don't have the flashy cars, cool clothes, ultrasophisticated equipment or trendy offices
bathed in moody light. More importantly, when they search a database of fingerprints, it comes up with
a list, if they're lucky, of a dozen or so possible candidates, probably with "whereabouts unknown".
For them, the truth is far more elusive.
In the risk analysis world we have to work with causal relationships that are usually probabilistic in
nature, for example:
the probability of having lung cancer within your life is x if you smoke;
the probability of having lung cancer within your life is y if you don't smoke.

424

Risk Analysis

We all know that x > y, which makes being a smoker a risk factor. But life is more complicated than
that: there is a biological gradient, meaning in this case the more you smoke, the more likely the cancer.
If we were to do a study designed to determine the causal relationship between smoking and cancer, we
should look not just at whether people smoked at all, but at how much a person has smoked, for how long
and in what way (cigars, cigarettes with or without filters, pipes, little puffs or deep inhaling, brand, etc).
Things are further complicated because people can change their smoking habits over time. How about:
the probability of having lung cancer within your life is a if you carry matches;
the probability of having lung cancer within your life is b if you don't carry matches.

1

i

I haven't done the study, but I bet a > b, although carrying matches should not be a risk factor. A
correct statistical analysis will determine the high correlation between carrying matches (or lighters) and
using tobacco products. A sensible statistician would figure out that matches should be removed from
the analysis. An uncontrolled statistical analysis can produce some silly results (imagine we had no
idea that tobacco could be related to cancer and didn't collect any tobacco-related data), so we should
always apply some disciplined thinking to how we structure and interpret a statistical model. We need
a few definitions to begin:
A risk factor is an aspect of personal behaviour or lifestyle, environment or characteristic thought
to be associated positively or negatively with a particular adverse condition.
A countelfactual world is an epidemiological hypothetical idea of a world similar to our own in all
ways but for which the exposure to the hazard, or people's behaviour or characteristics, or some
other change that affects exposure, has been changed in some way.
The population attributable risk (PAR) (aka population aetiological fraction, among many others) is
the proportion of the incidence in the population attributable to exposure to a risk factor. It represents
the fraction by which the incidence in the population would have reduced in a counterfactual world
where the effect associated with that risk factor was not present.

b

These concepts are often used to help model what the future might look like if we were to eliminate a
risk factor, but we need to be careful as they technically only refer to the comparison of an observed world
and a counterfactual parallel world in which the risk factor does not appear - making predictions of the
future means that we have to assume that the future world would look just like that counterfactual one.
In figuring out the PAR, we may well have to consider interactions between risk factors. Consider
the situation where the presence of either of two risk factors gives an extremely high probability of the
risk of interest, and where a significant fraction of the population is exposed to both risk factors. In this
case there is a lot of overlap and an individual risk factor has less impact because the other risk factor is
competing for the same victims. On the other hand, exposure to two chemicals at the same time might
produce a far greater effect than either chemical alone. We talk about synergism and antagonism when
the risk factors work together or against each other respectively. Synergism is more common, so the PAR
for the combination of two or more risk factors is usually less than the sum of their individual PARS.

1 5.1 Campylobacter Example
A large survey conducted by CDC (the highly reputable Center for Disease Control and Prevention) in
the United States looked at why people end up getting a certain type of food poisoning (campylobacteriosis). You get campylobacteriosis when bacteria called Campylobacter enter your intestine, find a

I

i

Chapter 15 Testing and modelling causal relationships

425

suitably protected location and multiply (form a colony). Thus, the sequence of events resulting in
campylobacteriosis must include some exposure to the bacteria, then survival of those bacteria through
the stomach (the acid can kill them), then setting up a colony. In order for us to observe the infection,
that person has to become ill. In order to identify the disease as campylobacteriosis, a doctor has to ask
for a stool sample, it has to be provided, the stool sample has to be cultured and the Campylobacter
have to be isolated and identified. Campylobacteriosis will usually resolve itself after a week or so of
unpleasantness, so many more people therefore have campylobacteriosis than a healthcare provider will
observe.
The US survey looked at behaviour patterns of people with confirmed cases, tried to match them
with others of the same sex, age, etc., known not to have suffered from a foodborne illness and looked
for patterns of differences. This is called a case-control study. Some of the key factors were as follows
(+ meaning positively associated with illness, - meaning negatively associated):
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

Ate barbecued chicken (+).
Ate in a restaurant (+).
Were male and young (+).
Had healthcare insurance (+).
Are in a low socioeconomic band (+).
There was another member of the family with an illness (+).
The person was old (+).
Regularly ate chicken at home (-).
Had a dog or cat (+).
Worked on a farm (+).

Let's see whether this matches our understanding of the world:
Factor 1 makes sense since Campylobacter naturally occur in chicken and are very frequently to be
found in chicken meat. People are also somewhat less careful with their hygiene and the cooking
is less controlled at a barbecue (healthcare tip: when you've cooked a piece of meat, place it on a
different plate than the one used to bring the raw meat to the barbecue).
Factor 2 makes sense because of cross-contamination in the restaurant kitchen, so you might eat a
veggie burger but still have consumed Campylobacter originating from a chicken.
Factor 3 makes sense because we guys tend not to pay much attention to lutchen practices when
we're young and start off rather hopeless when we first leave home.
Factor 4 makes sense in that, in the USA, visiting a doctor is expensive, and that is the only way
the healthcare will observe the infection.
Factor 5 maybe seems right because poorer people will eat cheaper-quality food and will visit
restaurants with higher capacity and lower standards (related to factor 2).
Factor 6 is obvious since faecal-oral transmission is a known route (healthcare tip: wash your hands
very well, particularly when you are ill).
Factor 7 makes sense because older people have a less robust immune system, but maybe they also
eat in restaurants more (less?) often, maybe they like chicken more, etc.

426

R~skAnalysis

Factor 8 seems strange. It appears from a number of studies that if you eat chicken at home you
are less likely to get ill. Maybe that is because it displaces eating chicken at a restaurant, maybe it's
because people who cook are wealthier or care more about their food or maybe (the current theory)
it is because -these people get regular small exposures to Campylobacter that boosts their immune
system.
Factor 9 is trickier. Perhaps pet food contains Campylobacter, perhaps the animal gets uncooked
scraps, then cross-infects the family.
Factor 10 makes sense. People working in chicken farms are obviously more at risk, but a farm will
often have just a few chickens, or will buy in manure as fertiliser or used chicken bedding as cattle
feed. Other animals also cany Campylobacter.
Each of the above is a demonstrable risk factor because each passed a test of statistical significance
in this study (and others) and one can find a possible rational explanation. Of course, the possible
rational explanation is often to be expected because the survey was put together with questions that
were designed to test suspected risk factors, not the ones that weren't thought of. Note that the causal
arguments are often interlinked in some way, making it difficult to figure out the importance of each
factor in isolation. Statistical software can deal with this given the appropriate control.

15.2 Types of Model to Analyse Data
Data can be analysed in several different ways in an attempt to determine the magnitude of hypothesised
causal relationships between variables (possible risk factors). Note, these models will not ever prove a
causal relationship, just as it is not possible to prove a theory, only disprove it.

Neural nets - look for patterns within datasets between several variables associated with a set
of individuals. They can find correlations within datasets, and make predictions of where a new
observation might lie on the basis of values for the conditioning variables, but they do not have
a causal interpretation and tend to be rather black box in nature. Neural nets are used a lot in
profiling. For example, they are used to estimate the level of credit risk associated with a credit card
or mortgage applicant, or identify a possible terrorist or smuggler at an airport. They don't seek to
determine why a person might be a poor credit risk, for example, just match the typical behaviour
or history of someone who fails to pay their bills - things like having defaulted before, changing
jobs frequently, not owning a home.
Classification trees - can be used to break down case-control data to list from the top down the
most important factors influencing the outcome of interest. This is done by looking at the difference
in fraction of cases and controls that have the outcome of interest (e.g. disease) when they are split
by each possible explanatory variable. So, for example, having a case-control study of lung cancer,
one might find the fraction of people with lung cancer is much larger among smokers than among
non-smokers, which forms the first fork in the tree. Looking then at the non-smokers only, one might
find that the fraction of people with lung cancer is much higher for those who worked in a smoky
environment compared with those who did not. One continually breaks down the population splits,
figuring out which variable is the next most correlated with a difference in the risk until you run
out of variables or statistical significance.
Regression models - logistic regression is used a lot to determine whether there is a possible
relationship between variables in a dataset and the variable to be predicted. The probability of a

Chapter 15 Testing and modelling causal relationships

427

"success" (e.g. exhibiting the disease) of a dichotomous (two-possible-outcome) variable we wish
to predict, pi, is related to the various possible influencing variables by regression equations; for
example
1

a

where subscript i refers to each observation, and subscript j refers to each possible explanatory
variable in the dataset, of which there are k in total. Stepwise regression is used in two forms:
forward selection starts off with no predictive variables and sequentially adds them until there is no
statistically significant improvement in matching the data; backward selection has all variables in
the pot and keeps taking away the least significant variable until the model's statistical predictive
capability begins to suffer. Logistic regression can take account of important correlations between
possible risk factors by including covariance terms. Like neural nets, it has no in-built causal thinking.
Bayesian belief networks (aka directed acyclic graphs) - visually, these are networks of nodes
(observed variables) connected together by arcs (probabilistic relationships). They offer the closest
connection to causal inference thinking. In principle you could let DAG software run on a set of
data and come up with a set of conditional probabilities - it sounds appealing and objectively hands
off, but the networks need the benefit of human experience to know the direction in which these
arcs should go, i.e. what the directions of influence really are (and if they exist at all). I'm a firm
believer in assigning some constraints to what the model should test, but make sure you know why
you are applying those constraints. To quote Judea Pearl (Pearl, 2000): "[C]ompliance with human
intuition has been the ultimate criterion of adequacy in every philosophical study of causation, and
the proper incorporation of background information into statistical studies likewise relies on accurate
interpretation of causal judgment".

Commercial software are available for each of these methods. The algorithms they use are often
proprietary and can give different results on the same datasets, which is rather frustrating and presents
some opportunities to those who are looking for a particular answer (don't do that). In all of the above
techniques, it is important to split your data up into a training set and a validation set to test whether
the relationships that the software find in the training set will let you reasonably accurately (i.e. at the
decision-maker's required accuracy) predict the outcome observations in the validation dataset. Best
practice involves repeated random splitting of your data into training and validation sets.

15.3 From Risk Factors t o Causes
Let's say that you have completed a statistical analysis of your data and your software has come up with
a list of risk factors. The numerical outputs of your statistical analysis will allow you to calculate PAR
for each factor, and here you should apply a little common sense because PAR relates to the decision
question you are answering.
Let me take the campylobacteriosis study as an example. You first need to know a couple of things
about Campylobacter. It does not survive long outside its natural host (animals like chickens, ducks
and pigs where it causes no illness) and so it does not establish reservoirs in the ground, in water,
etc. It also does not generally stay long in a human gut, although many people could be harbouring

428

Risk Analysis

the bacteria unknowingly. This means that, if we were to eliminate all the Campylobacter at their
animal sources, we would no longer have human campylobacteriosis cases (ignoring infections from
travelling). I was lead risk analyst for the US FDA where we wanted to estimate the number of people
who are infected with fluoroquinolone-resistant Campylobacter from poultry - fluoroquinolone is used
to treat poultry (particularly chickens) with the respiratory disease they get from living in sheds with
poor ventilation so the ammonia from their urine strips out the lining of their lungs. We reasoned:
if say 100 000 people were getting campylobacteriosis from poultry, and say 10% of the poultry
Campylobacter were fluoroquinolone resistant, then about 10 000 were suffering campylobacteriosis
that would not be treatable by administering fluoroquinolone (the antimicrobial is also often used to
treat suspected cases of food poisoning). We used the CDC study and their PAR estimates. The case
ended up going to court, and a risk analyst hired by the opposing side (the drug sponsor, who sold a
lot more of their antimicrobial to chicken farms than to humans) got the CDC data under the Freedom
of Information Act and did a variety of statistical analyses using various tools. He concluded: "A more
realistic assessment based on the CDC case-control data is that the chicken-attributable fraction for
[the pathogen] is between - 11.6 % (protective effect) and 0.72 % (not statistically significantly different
from zero) depending on how missing data values are treated". In other words, he is saying with this
-11.6 % attributable fraction figure that chicken is protective, so in a counterfactual world without
chicken contaminated with Campylobacter there would be more campylobacteriosis, i.e. if we could
remove the largest source of exposure we have to Campylobacter (poultry), more people would get ill.
Put another way, he believes that the Campylobacter on poultry are protective, but the Campylobacter
from other sources are not.
Using classification trees, for example, he determined that the major risk factors were, in descending
order of importance: visiting a farm, travelling, having a pet, drinking unprocessed water, being male
(then eating ground beef at home, eating pink hamburgers and buying raw chicken) or being female
(and then having no health insurance, eating high levels of fast food, eating hamburgers at home . . .
and finally, eating fried chicken at home). Note that chicken is at the bottom of both sequences.
So how did this risk analyst manage to justify his claim that eating chicken was actually protective - it
did not pose a threat of campylobacteriosis? He did so by misinterpreting the risk factors. There is
really no sense in considering a counterfactual world where people are all neuter (neither male nor
female) - and anyway, since we don't have any of those, we have no idea how their behaviour will be
different from males or females. Should we really be including whether people have insurance as a risk
factor to which we assign a PAR? I think not. It is perhaps true that all these factors are associated with
the risk - meaning that the probability of campylobacteriosis is correlated with each factor, but they
are not risk factors within the context of the decision question. I don't think that by paying people's
health insurance we would likely change the number of illnesses, although we would of course change
the number reported and treated. What we hope to achieve is an understanding of how much disease
is caused by Campylobacter from chicken, so the level of total human illness needs to be distributed
among the sources of Campylobacter. That brings some focus to the PAR calculations: dining in a
restaurant is only a risk factor because Campylobacter is in the restaurant kitchen. How did it get
there? Probably chickens mostly, but also ducks and other poultry, although the US eat those in far
lower volumes. It could also sometimes be a kitchen worker with poor hygiene unknowingly carrying
Campylobacter,but where did that worker originally get the infection? Most probably from chicken. The
sex' of a person is no longer relevant. Having a pet (it was mostly puppies) is a debatable point, since
the puppy probably became infected from contaminated meat rather than being a natural carrier itself.

'

Not "gender", which I found out one day listening to a debate in the UK House of Lords is what one feels oneself to be, while
"sex" is defined by the reproductive equipment with which we are born.

t

!

Chapter 15 Testing and modell~ngcausal relationships

429

Looking just at Campylobacter sources, we get a better picture, and, although regular small amounts
of exposure (eating at home) may be protective, this is protecting against other mostly chicken-derived
Campylobacter exposure and we end up with the same risk attribution that CDC determined from its
own survey data.
We won the court case, and the other risk analyst's testimony was, very unusually, rejected as being
unreliable - in no small part because of his selective and doctored quoting of papers.

15.4 Evaluating Evidence
The first test of causality you should make is to consider whether there is a known or possible causal
mechanism that can connect two variables together. For this, you may need to think out of the box:
the history of science is full of examples where people considered something impossible, in spite of an
enormous amount of evidence to the contrary, because they were so firmly attached to their pet theory.
The second test is temporal ordering: if a change in variable A has an effect on variable B, then the
change in A should occur before the resultant change in B. If a person dies of radiation poisoning (B)
then that person must have received a large dose of radiation (A) at some previous time. We can often
test for temporal ordering with statistics, usually some form of regression. But be careful, temporal
ordering doesn't imply a causal relationship. Imagine you have a variable X that affects variables A
and B, but B responds faster than A. If X is unobserved, all we see is that A exhibits some behaviour
that strongly correlates in some way with the previous behaviour of B.
The third test is to determine in some way the size of the possible causal effect. That's where statistics comes in. From a risk analysis perspective, we are usually interested in what we can change about
the world. That ultimately implies that we are only really interested in determining the magnitude of
the causal relationships between variables we can control and those in which we are interested. Risk
analysts are not scientists - our job is not to devise new theories but to adapt the current scientific (or
financial, engineering, etc.) knowledge to help decision-makers make probabilistic decisions. However,
as a breed, I like to think that we are quite adept at stepping back and asking whether a tightly held
belief is correct, and then posing the awkward questions. It's quite possible that we can come up with
an alternative explanation of the world supported by the available evidence, which is fine, but that
explanation has to be presented back to the scientific community for their blessing before we can rely
on it to give decision-making advice.

15.5 The Limits of Causal Arguments
My son is just starting his "Why?" phase. I can see the interminable conversations we will have: "Papa,
why does a plane stay in the air". "Because it has wings". "Why?'. "Because the wings hold it up".
"Why?'. 'Because when an airplane goes fast the wind pushes the wings up". "Why?'. Dim memories
of Bernoulli's equation won't be of much help. "I don't know" is the inevitable end to the conversation.
I can see why kids love this game - once we get to three or four answers, we parents reach the limit
of our understanding. He's soon going to find out I don't know everything after all, and 1'11 plummet
from my pedestal (he's already realised that I can't mend everything he breaks).
Causal thinking is the same. At some point we are going to have to accept the existence of the causal
relationships we are using without really knowing why. If we're lucky, the causal link will be supported
by a statistical analysis of good data, some experiential knowledge and a feeling that it makes sense. If we
go back far enough, all that we believe we know is based on assumptions. My point is that, when you have

430

Risk Analysis

completed your causal analysis, try to be aware that the analysis will always be based on some assumptions, so sometimes a simple analysis is all you need to get the necessary guidance to your problem.

15.6 An Example of a Qualitative Causal Analysis
Our company does a lot of work in the field of animal health, where we help determine the risk of
introducing or exacerbating animal and human disease by moving animals or their products around the
world. This is a very well-developed area of risk analysis, and a lot of models and guidelines have
been written to help ensure that there is a scientifically based rationale for accepting, rejecting and
controlling such risks. Chapter 22 discusses animal health risk analysis. I present a risk analysis below
as an illustration of the need for a healthy cynicism when reviewing scientific literature and official
reports, and as an example of a causal analysis that I performed with absolutely no quantitative data for
an issue for which we do not yet have a complete understanding.

15.6.1 The problem
A year ago I was asked to perform a risk analysis on a particularly curious problem with pigs. Postweaning multisystemic wasting syndrome (PMWS) affects pigs after they have finished suckling. I had
had some dealings with this problem before in another court case. The "syndrome" part of the name
means a pattern of symptoms, which is the closest veterinarians can come to defining the disease since
nobody knows for sure what the pathogen is that creates the problem. Until recently there hasn't even
been an agreed definition of what the pattern of symptoms actually is. A herd case definition for PMWS
was recently agreed by an EU-funded consortium (EU, 2005) led by Belfast University. The PMWS
case definition on herd level is based on two elements - (1) the clinical appearance in the herd and
(2) laboratory examination of necropsied (autopsy for animals) pigs suffering from wasting.

I

1. Clinical appearance on herd level
The occurrence of PMWS is characterised by an excessive increase in mortality and wasting post
weaning compared with the historical level in the herd. There are two options for recognising this
increase, of which l a should be used whenever possible:
l a . If the mortality has been recorded in the herd, then the increase in mortality may be recognised
in either of two ways:
1. Current mortality 2 mean of historical levels in previous periods +1.66 standard deviations.
2. Statistical testing of whether or not the mortality in the current period is higher than in the
previous periods by the chi-square test.
In this context, mortality is defined as the prevalence of dead pigs within a specific period of time.
The current time period is typically 1 or 2 months. The historical reference period should be at least
3 months.
Ib. If there are no records of the mortality in the herd, an increase in mortality exceeding the
national or regional level by 50 % is considered indicative of PMWS.
2. Pathological and histopathological diagnosis of PMWS
Autopsy should be performed on at least five pigs per herd. A herd is considered positive for PMWS
when the pathological and histopathological findings, indicative for PMWS, are all present at the
same time in at least one of the autopsied pigs. The pathological and histopathological findings are:

Chapter 15 Testing and modelling causal relationships

43 1

1. Clinical signs including growth retardation and wasting. Enlargement of inguinal lymph nodes,
dyspnoea, diarrhoea and jaundice may be seen sporadically.
2. Presence of characteristic histopathological lesions in lymphoid tissues: lymphocyte depletion
together with histiocytic infiltration and/or inclusion bodies and/or giant cells.
3. Detection of PCV2 in moderate to massive quantity within the lesions in lymphoid tissues of
affected pigs (basically using antigen detection in tissue by immunostaining or in situ hybridisation).
Other relevant diagnostic procedures must be carried out to exclude other obvious reasons for high
mortality (e.g. E. coli post-weaning diarrhoea or acute pleuropneumonia).
The herd case definition is highly unusual: a result of the lack of identification of the pathogenic
organism. It will need revision when more is known about the syndrome. The definition is also vulnerable
from a statistical viewpoint. To begin with, the definition acknowledges the wasting symptom in PMWS,
but the definitions only apply to mortality. PMWS can only be defined at a herd level because one has
statistically to differentiate the increase in rate of mortality and wasting post weaning from historical
levels in the herd or from other unaffected herds. Thus, for example, PMWS can never be diagnosed
for a backyard pig using this definition.
The chi-square test quoted above is based on making a normal approximation to a binomial variable.
The approximation is only good if one has a sufficiently large number of animals n in a herd and a
sufficiently high prevalence p of mortality or wasting in both unaffected and affected herds. Thus, it
becomes progressively more difficult to differentiate an affected from an unaffected herd where the herd
is small. The alternative requirement of prevalence at > 1.66 standard deviations above previous levels
and the chi-square table provided in this definition are determined by assuming that one should only
diagnose that a herd has PMWS when one is at least 95 % confident that the observed prevalence is
greater than normal. This means that one can choose to declare a herd as PMWS positive when one is
only 95 % confident that the fraction of animals dying or wasting is greater than usual. While one needs
to set a standard confidence for consistency, this is illustrative of the difference in approach between
statistics and risk analysis: in risk analysis one balances the cost associated with correct and incorrect
diagnosis and chooses a confidence level that minimises losses.
The definition has other statistical issues; for example, the use of prevalence assumes that a population
is static (all in, all out) within a herd, rather than a continuous flow. It also does not take into account
the possible effects of a deteriorated farm management that would raise the mortality and wasting rates,
nor of an improved farm management whose improvements would balance against, and therefore mask,
the increased mortality and wasting due to PMWS.
Other definitions of PMWS have been used. New Zealand, for example, made their PMWS diagnosis
on the basis of at least a 15 % post-weaning mortality rate together with characteristic histopathological
lesions and the demonstration of PCV2 antigen in tissues. Denmark diagnoses the disease in a herd on
the basis of histopathology and demonstration of PCV2 antigen in pigs with or without clinical signs
indicative of PMWS and regardless of the number of animals.

15.6.2 Collecting information
PMWS is a worldwide problem among domestic pig populations. It is very difficult to compare experiences in different countries because there hasn't been a single agreed definition until recently, and
there are different motivations involved for reporting the problem. In one country I investigated,
farmers were declaring they had PMWS with, it seemed, completely new symptoms - but when I

432

Risk Analysis

talked confidentially to people "on the ground I found out that, if the problem were declared to
be PMWS, the farmers would be completely compensated by their government, whereas if it were
another, more obvious issue they would not. Another country I investigated declared that it was completely free of PMWS, which seemed extraordinary given the ubiquitous nature of the problem and
that genetically indistinguishable PCV2 had been detected at similar levels to other countries battling
with PMWS. But the pig industry of this country wanted to keep out pork imports and their freedom
from the ubiquitous PMWS was a good reason justified under international trading law. The country used a different (unpublished) definition of PMWS that included the necessity of observing an
increased wasting rate, and I was told that in their one suspected herd the pigs that were wasting were
destroyed prior to the government assessment, with the result that the required wasting rate was not
observed.
The essence of my risk analysis was to try to determine which, if any, of the various causal theories
could be true and then determine whether one could find a way to control the import risk for our clients
given the set of plausible theories. The main impediment to doing so was that it seemed every scientist
investigating the problem had their own pet theory and completely dismissed the others. Moreover,
they conducted experiments designed to affirm their theory, rather than refute it. I distilled the various
theories into the following components:
Theory 1. PCV2 is the causal agent of PMWS in concert with a modulation of the pig's immune
system.
Theory 2. A mutation (or mutations) of PCV2 is the causal agent (sometimes called PCV2A).
Theory 3. PCV2 is the causal agent, but only for pigs that are genetically more susceptible to the
virus.
Theory 4. An unidentified pathogen is the causal agent (sometimes called Agent X).
Theory 5. PMWS does not actually exist as a unique disease but is the combination of other clinical
infections.

Note that the five theories are not all mutually exclusive - one theory being true does not necessarily
imply that the other theories are false. Theory 1 could be true together with theories 2 or 3 or both.
Theories 2 and 3 are true only if theory 1 is true, and theories 4 and 5 eliminate the possibility of all
other theories. A theory of causality can never be proved, only disproved - an absence of observation
of a causal relationship cannot eliminate the possibility of that relationship. The five theories with their
partial overlap were structured to provide the most flexible means for evaluating the cause of PMWS. I
did a review of all (15) pieces of meaningful evidence I could find and categorised the level of support
that each gave to the five theories as follows:

conflicts (C), meaning that the observations in this evidence would not realistically have occurred
if the theory being tested was correct;
neutral (N), meaning that the observations in this evidence provide no information about the theory
being tested;
partially supports (P), meaning that the observations in this evidence could have occurred if the
theory being tested was correct, but other theories could also account for the observations;
supports (S), meaning that the observations in this evidence could only have occurred if the theory
being tested was correct.

C h a ~ t e r15 Testing and modelline causal relationshi~s 4 3 3

15.6.3 Results and conclusions
The results are presented in Table 15.1.

+

Theory 1 (PCV2 immune system modulation causes PMWS). This theory is well supported by
the available evidence. It explains the onset of PMWS post weaning and the presence of other
infections, or vaccines, stimulating the immune system as being cofactors. It explains how the use
of more stringent sanitary measures in a farm can help contain and avoid PMWS. On its own it
does not explain the radially spreading epidemic observed in some countries, nor the difference in
susceptibility observed between pigs and pig breeds.
Theory 2 (PCV2A). This theory is also well supported by the available evidence. It explains the
radially spreading epidemic observed in some countries but does not explain the difference in susceptibility observed between pigs and between pig breeds.
Theory 3 (PCV2 genetic susceptibility). This theory is supported by the small amount of data
available. It could explain the targeting of certain herds over others and the difference in attack rates
between pigs breeds.
Theory 4 (Agent X). This theory is unanimously contradicted by all the available evidence that could
be used to test it.
Theory 5 (PMWS does not actually exist). This theory is unanimously contradicted by all the available
evidence that could be used to test it.

+

As a result, I concluded (rightly, or wrongly, at the time of writing we still don't know the truth) that
it appears from the available evidence that PMWS requires at least two components to be established:

1. A mutated PCV2 that is more pathogenic than the ubiquitous strain(s). There may well be several
different localised mutations of PCV2 in the world's pig population that have varying levels of
pathogenicity. This would in part explain the high variance in attack rates in different countries,
although farm practices, pig genetics and other disease levels will be confounders.
Table 15.1 Comparison of theories on the relationship between PCV2
P = partially
and PMWS and the available evidence (S = supports;
. .
supports; N = neutral; C = conflicts).
Evidence Theory 1 Theory 2 Theory 3 Theory 4 Theory 5
P
1
P
N
N
C
P
2
N
N
C
P

434

Risk Analysis

2. Some immune response modulation, due either to another disease, stress, a live vaccine, etc. The
theory that PMWS requires an immune system modulation is particularly well supported by the
data, both in in vitro and in vivo experiments, and from field observations that co-infection and
stress are major risk factors.
There is also some limited, but very convincing, evidence (Evidence 15) from Ghent University (by
coincidence the town I live in) that the onset of PMWS is related to a third factor:
1. Susceptibility of individual pigs to the mutated virus. The evidence collected for this report suggests

that the variation in susceptibility, while genetic in nature, is not obviously linked to the parents of
a pig. The apparent variation in susceptibility owing to race may mean that susceptibility can be
inherited over many generations, i.e. that there will be a statistically significant difference over many
generations, but the variation between individuals in a single litter would exceed the generational
inherited variation.

15.7 I s Causal Analysis Essential?
In human and animal health risk assessment, we attempt to determine the causal agent(s) of a health
impact. Once determined, one then attempts to apportion that risk among the various sources of the
causal agent(s), if there is more than one source. Some risk analysts, particularly in the area of human
health, argue that a causal analysis is essential to performing a correct risk analysis.
The US Environmental Protection Agency, for example, in its guidelines on hazard identification,
discusses the first step in their risk analysis process: "The objective of hazard identification is to determine whether the available scientific data describe a causal relationship between an environmental agent
and demonstrated injury to human health or the environment". Their approach is understandable. It is
extremely difficult to establish any causal relationship between a chemical and any human effect that
can arise owing to chronic exposure to that chemical (e.g. a carcinogen), since many chemicals can
precipitate the onset of cancer and that may only eventuate after many years of exposure, probably to
many different carcinogens. We can't start by assuming that all chemicals can cause cancer. On the
other hand, we may fail to identify many carcinogens because the data and scientific understanding are
not there. If we are to protect the population and environment, we have to rely on suspicion that a
chemical may be carcinogenic because of similarities with other known carcinogens and act cautiously
until we have the evidence that eliminates that suspicion.
In microbial risk assessment, the problem is simpler either because an exposure to bacteria will
immediately result in infection or because the bacteria will pass through the human gut without effect,
and cultures of stools or blood analyses will usually tell us which bacterium has caused the infection.
By definition, Campylobacter causes campylobacteriosis, for example, so that the risk of campylobacteriosis must logically be distributed among the sources of Campylobacter, because if all sources of
Campylobacter were removed in a counterfactual world there would be no more campylobacteriosis.
I am of the view that we should definitely take the first step of hazard identification and attempt to
amass causal evidence, but the lack of evidence should not lead us to dismiss a suspected hazard from
concern, although clear evidence of a lack of causality should. We should also perform broad causal
studies with an open mind because, although a strong though unsuspected statistical inference does not
prove a causal relationship, finding one may nevertheless offer some lines of investigation leading to
discovery of previously unidentified hazards.

Chapter

Optimisation in risk analysis
by Dr Francisco Zagmutt, Vose Consulting US

16.1 Introduction
Analysts are often faced with the question of how to find a combination of values for interrelated decision
variables (i.e. variables that one can control) that will provide an optimal result. For example, a bakery
may want to know the best combination of materials to make good bread at a minimum price; a portfolio
manager may want to find the asset allocation that yields the highest returns for a certain level of risk;
or a medical researcher may want to design a battery of tests that will provide the most accurate results.
The purpose of this chapter is to introduce the reader to the basic principles of optimisation methods
and their application in risk analysis. For more exhaustive treatments of different optimisation methods,
the readers are directed to specialised books on the subject, such as Randin (1997), Dantzig and Thapa
(1997, 2003) and Bazaraa et al. (2004, 2006).
Optimisation methods aim to find the values of a set of related variable(s) in the objectivefunction that
will produce the minimum or maximum value as required. There are two types of objective function:
deterministic and stochastic. When the objective function is a calculated value in the model (deterministic), we simply find the combination of parameter values that optimise this calculated value. When
the objective function is a simulated random variable, we need to decide on some statistical measure
associated with that variable that should be optimised (e.g. its mean, it 95th percentile or perhaps the
ratio of standard deviation to mean). Then the optimising algorithm must run a simulation for each set
of decision variables values and record the statistic. If one wanted, for example, to minimise the O.lth
percentile, it would be necessary to run thousands of iterations, for each set of decision variable values
tested, to have a reasonable level of accuracy - and that can make optimising under uncertainty very
time consuming. As a general rule, we strongly advise that you try to find some means to calculate the
objective function if at all possible. ModelRisk, for example, has many functions that return statistical measures for certain types of model, and the relationships between stochastic models discussed in
Chapter 8 can help greatly simplify a model.
Let's start by introducing an example.
When a pet food manufacturer wants to make an economically optimal allocation of ingredients for
a dog formula, he may have the choice to use different commodities (i.e. corn or wheat as the main
source of carbohydrates), but the company will want to use the combination of components that will
minimise the cost of manufacturing, without losing nutritional quality. Since the price of commodities
fluctuates over short periods of time, the feed inputs will have to be optimised every time a new contract
for commodities is placed. Hence, an optimal feed would be the one that minimises the ration cost but
also maintains the nutritional value of the feed (i.e. required carbohydrate, protein and fat contents in a
dog's healthy diet).

I
I

I

With this example we have introduced the reader to the concept of constrained optimisation, where
the objective is still to minimise or maximise the output from a function by varying the input variables,
but now the values of some input variables are constrained to only feasible values of those variables
(the nutritional requirements). Going back to the dog feed example, if we know that adult dogs require
a minimum of 18 % of protein (as % of dry matter), then the model solution should be constrained to
the combination of ingredients that will minimise the cost while still providing at least 18 % of protein.
An input can take more than one constraint; for example, dogs may also have a maximum protein
requirement (to avoid certain metabolic diseases) which can also be constrained into the model.
The optimal blending of diets is in fact a classical application of linear programming, an area of
optimisation that will be revisited later in this chapter.
Optimisation requires three basic elements:
a

a

a

The objective function f and its goal (minimisation or maximisation). This is a function that
expresses the relationship among the model variables. The outputs from the objective function
are called responses, performance measures or criteria.
Input variable(s), also called decision variables, factors, parameter settings and design variables,
among many other names. These are the variables whose values we want to experiment with using
the optimisation procedure, and that we can change or control (make a decision about, hence the
name decision variable).
Constraints (if needed), which are conditions that a solution to an optimisation problem must satisfy
to be satisfactory. For example, when only limited resources are available, that constraint should
be explicit in the optimisation model. Variable bounds represent a special case of constraints. For
example, diet components can only take positive values; hence they are bounded to zero.

Throughout this chapter we will review how these elements combine to create an optimisation model.
The field of optimisation is vast, and there are literally hundreds of techniques that can be used to solve
different problems. However, in practical terms the main differences between methods reside in whether
the objective function and constraints are linear or non-linear, whether the parameters are fixed or include
variability and/or uncertainty and whether all or some parameters are continuous or integers. The following sections give the background to basic optimisation methods, and then present practical examples

16.2 Optimisation Methods
There are many optimisation methods available in the literature and implemented in commercial software.
In this section we introduce some of the most used methods in risk analysis.

16.2.1 Linear and non-linear methods
In Section 16.1 we presented a diet blend model and mentioned it was a typical linear programming
application. This model is linear since the objective function and constraints are linear. The general
form of a linear objective function can be expressed as:

max / min f (XI,x2, . . . , x,) = alxl

+ ~ 2 x 2+ . . . + a,x,

(16.1)

where f is the objective function to be minimised or maximised, and x and a are input variables and
their respective coefficients.

Chapter 16 Optimisation in risk analysis

437

The objective function can be subject to constraints in the form

Equation (16.1) shows that the constraints imposed on the optimisation problem must also be linear
to be considered a valid linear optimisation problem.
From Equations (16.1) and (16.2) we can deduce two important assumptions of linear optimisation:
additivity and proportionality:
Additivity entails that the values from the objective function are the result of the sum of all the
variables multiplied by their coefficients, independently. In other words, the increase in the results
of the objective function will be the same whether a certain variable increases from 10 to 11 or from
50 to 51.
Proportionality requires that the value of a term in the linear function is directly proportional to the
amount of that variable in the term. For example, if we are optimising a diet blend, the total cost of
corn in the blend is directly related to the amount of corn used in the blend. Hence, for example, the
concept of economies of scales would violate the assumption of proportionality since the marginal
cost decreases as we increase production.
The most common methodology to solve linear programming problems is called the simplex algorithm,
which was invented by George Dantzig in 1947 and is still used to solve purely linear optimisation
problems. For a good explanation of the simplex methodology the reader is directed to the excellent
book by Dantzig and Thapa (1997).
We cannot apply linear programming if our objective function includes a multiplicative term such
as f (xl, x2) = alxl * ~ 2 x 2because we would be violating the additivity assumption. Recall that we
mentioned that a unit increase in a decision variable will have the same impact on the results of the
objective function, regardless of the current absolute value of the variable. We can't make this assumption
with our multiplicative example, since now the impact that a change in a variable has in the objective
function will depend on the size of the other variable by which it is multiplied. For example, in a
simple function f (x) = ax2, with a = 5, if we increase x from 1 to 2, the results will change by 15
units (5 * 22 - 5 * 12), whereas if x increases from, say, 6 to 7, the function will change by 65 units
(5 * 72 - 5 * 62).
Non-linear problems impose an extra challenge in optimisation, since they may present more than
one minimum or maximum depending on the domain being evaluated. Optimisation methods aiming
and finding the absolute largest (or smallest) value of the objective function in the domain observed
are called global optimisation methods. We will discuss different approaches to global optimisation in
Section 16.3.
The final function to consider is where the relationships in a function are not only non-linear but also
non-smooth. For example, the relationships among some variables in the model use Boolean logic (i.e.
IF, VLOOKUP, INDEX, CHOOSE) with the effect that the function will present sudden changes, e.g.
drastic jumps or drops, making it uneven or "jumpy". These functions are particularly hard to solve
using standard non-linear programming methods and hence require special techniques to find reasonable
solutions.

438

Risk Analysis

16.2.2 Stochastic optimisation
Stochastic optimisation has received a great deal of attention in recent years. One of the reasons for
this growth is that many applied optimisation problems are too complex to be solved mathematically
(i.e. using the linear and non-linear mathematical methods described in the previous section). Stochastic optimisation is the preferred methodology when problems include many complex combinations of
options andlor relationships that are highly non-linear, since such problems either are impossible to
solve mathematically or cannot feasibly be solved within a realistic timeframe.
Simulation optimisation is also essential if the parameters of the model are random or include uncertainty, which is usually the case in many of the models applied to real-world situations in risk analysis.
Fu (2002) presents a summary of current methodologies in stochastic optimisation, and some of the
applications of this method. Most commercial stochastic optimisation software use metaheuristics to
find the optimal solutions. In this method, the simulation model is treated as a black-box function
evaluator, where the optimiser has no knowledge of the detailed structure of the model. Instead, combinations of the decision variables that achieve desirable results (i.e. minimise the objective function
more than other combinations) are stored and recombined by the optimiser into updated combinations, which should eventually find better solutions. The main advantage of this method is that it
does not get "stuck in local minima or maxima. Some software vendors claim that this methodology also finds optimal values relatively faster than other methods, but this is not necessarily true,
especially when the optimisation problem can be quickly solved with well-formulated mathematical
functions.
Usually, three steps are taken at each iteration of the stochastic optimisation:
Possible solutions for the variables are found.
The solutions found in the previous step are applied to the objective function.
If the stopping criterion is not accomplished, a new set of solutions is calculated after the results of
the previous combinations are evaluated. Otherwise, stop.
Although the above process is conceptually simple, the key to a successful stochastic optimisation
resides in the last step, because trying all the combinations of values from different random variables
becomes unfeasible (especially when the model includes continuous variables). For this reason, most
implementations of stochastic optimisation focus their efforts on how to narrow the potential solutions
based on the solutions already known. Some of the methods used for this purpose include genetic algorithms, evolutionary algorithms, simulated annealing, path relinking, scatter search and tabu search, to
name a few. It is beyond the objective of this chapter to review these methodologies, but interested
readers are directed to the chapter on metaheuristics in Pardalos and Resende (2002), and to the work
by Goldberg (1989) and by Glover, Laguna and Marti (2000).
Most commercial Excel add-ins include metaheuristic-based stochastic optimisation algorithms. Some
of the most popular include Opt~uest@for Crystal all@, R I S K ~ ~ t i m i s e rfor
@ @Risk@ and very
recently Risk Solver@.Similar tools are also available for discrete-event simulation suites. There is also
a myriad of statistical and mathematical packages such as R@, SASB and ~athematica@that allow
for complicated optimisation algorithms. In Vose Consulting we rely quite heavily on these applications
(particularly R@) when developing advanced models, but we will stick to Excel-based optimisers here
to avoid having to explain their syntax structure.

Chapter 16 Optimisation in risk analysis

439

16.3 Risk Analysis Modelling and Optimisation
In this section we introduce the reader to some applied principles to implement optimisation models in
a spreadsheet environment, and then briefly explain the use of the different possible settings in Solver,
the default optimisation tool in Excel.

16.3.1 Global optimisation
In the previous section we discussed some of the limitations of linear programming, including the
problem with local minima and maxima depending on the starting values. Figure 16.1 shows a simple
function in the form
f (x) = sin (cos(x) exp

( ))

The function has several peaks (maxima) and valleys (minima) within the plotted range. A function like
this is called non-linear (changes in f (x) are not monotonically increasing with x), and also non-convex
(i.e. line segments drawn from any point to another point can lie above or below the graph of f (x),
depending on the region of the function domain).
Optimisation software like Excel's Solver and other linear and non-linear constrained optimisation
software follow a path from the starting values to the final solution values, using as guide the direction
and curvature of the objective function (and constraints). The algorithm will usually stop at the minimum
or maximum closest to the initial values provided, making the optimiser output quite sensitive to the
starting values.
For example, if the function in Figure 16.1 is to be maximised and a starting value is close to the
smaller peak (Max I), the "best" solution the software will find will be Max 1, when in fact the global
peak for this particular function is located at Max 2.
Evidently, in most risk analysis applications the desirable solution will be the highest (or the lowest)
peak and not a local one. In other words, we always want to make sure that the optimisation is global
rather than local. Depending on the software used, there are several ways to make sure we can obtain
a global optimisation.
Excel's Solver is among the most broadly used optimisation software, as it is part of the popular
spreadsheet bundle, and its algorithms are very sensitive to the initial values provided by the analyst.
Thus, when possible, the entire feasible range of the objective function should be plotted to identify the
global peak or valleys. From evaluating the graph, a rough estimate can then be used as an initial value.
Consider the model shown in Figure 16.2. The objective function is again

(

f ( x ) = sin cos(x) exp

(3)

and is unconstrained within the boundaries shown (-4.2 to 8). When plotting the function, we know
the global maximum is somewhere close to -0.02, so we will use this value in Solver.
To do so, we first enter the value -0.02 into cell x (C2), and then we select Tools -+ Add-Ins and
check the Solver Add-In box and click the OK button. Then go back to Excel and select Tools +
Solver to obtain the menu shown in Figure 16.3.

Min 1

Figure 16.1 A non-linear function presenting multiple maxima and minima.

Figure 16.2 Sensitivity of Excel's Solver to local conditions. The dot represents the optimal solution found
by Solver.

Under "Set Target Cell" we add a reference to the named cell fx (C3), then, since in this example we
want to minimise the function, we select "Equal To" Max and we finally add a reference to named cell
x (C2) under the "By Changing Cells" box. Now that we are ready to run the optimisation procedure
(we will see more about the Solver menus and options later in this chapter, now we will use the default
settings), we click the "Solve" button and after a very short period we should see a form stating that a

-

-

Chapter 16 Optimisation in risk analysis

44 1

Figure 16.3 Excel's Solver main menu.

solution has been found. Select the "Keep Solver Solution" option and click the " O K button. We can
see that Solver successfully found the global maxima since we provided a good initial value.
What would happen if we didn't provide a reasonable initial value? If we repeated the same procedure
but started with, say, -3 in cell x, we would obtain a maximum of -3.38, which turns out to be the
first peak (Max 1 in Figure 16.1). If we started with a larger value, i.e. 4, Solver would find 6.04 as the
optimal maximum, which is Max 3 in Figure 16.1. The reader can use the supplied model and try initial
values to look for minima and maxima and explore how the optimisation algorithm behaves, particularly
to notice the model behaviour when the Solver options (e.g. linearity assumption, quadratic estimates)
are changed.
An alternative for dealing with local minima and maxima is to restrict the domain to be evaluated.
We have already limited the domain by exploring only a limited section of our objective function
(-4 to 8). However, the domain still contained several peaks and valleys. In contrast, if the domain
observed contains only one peak or valley (i.e. -2, 2), the function becomes concave (or convex) which
can be solved with a variety of fast and reliable techniques such as the interior point methods readily
implemented in Solver. Since we know the global peak resides somewhere around zero, we can restrict
the domain of the objective function to (-2, 2) using the constraint feature in Solver. First enter -2 in
cell C6 and 2 in cell C7. Then name the cells "Min" and "Max" respectively. After that, open Solver
and click on the Add button. Type "x" under cell reference, select <= and then type "=Maxn in the
Constraint box. Once that is completed, click the Add button, and, following the same procedure, add
the second constraint, x >= Min. Once both constraints are added, click OK and then Solve. Solver
should find an optimal x close to -0.25 which is the global maximum, so, even though the function has
many local optimal values, now we have successfully restricted the domain enough so the numerical
method can easily find the optimal values. Even if an aberrant number is entered (e.g. 1000) as the
initial value, the domain is so narrow now that the algorithm will still find the optimal value. Try it!
When the function is not tractable (e.g. complex simulation models), plotting is not an option since the
figure could be k-dimensional (and we all have a hard time interpreting elements with more than three
dimensions). Hence, for this case, if the user plans on using Solver, he or she should attempt different
initial values manually, based on the knowledge of the system being modelled. Another more automated
option is to use more sophisticated applications that rely on metaheuristic methods, as explained in
Section 16.2.2. Later in this chapter we present the solution to a problem where the function not only
is intractable but also is highly non-linear and non-smooth and contains a series of integer decision
variables and complex constraints.

442

Risk Analysis

Commercial optimisation software use different methods to make sure only global optimal solutions
are found. As already discussed, metaheuristic methods can be very efficient in finding global optimal solutions. Other commercial software rely upon multistart methods for global optimisation, which
automatically try different starting values until a global solution is found. Although they are reasonably effective, such methodologies can be quite time consuming when solving highly non-linear and
non-smooth functions, or when little is known about the parameters to optimise (uninformed starting
values).

16.3.2 A few notes on using Excel's Solver
We have already mentioned that Excel's Solver is an optimisation tool built into Microsoft's Excel and
dispatched with all copies of Excel. Although the tool has limitations, it can be used in a variety of
situations when stochastic simulation is not required. Solver implements a variety of algorithms to solve
linear and non-linear problems. It uses the generalised reduced gradient (GRG) algorithm to solve nonlinear programming methods, and, when the correct settings are used, it can use the simplex method, a
well known and robust method for solving linear optimisation problems.
The mysterious Options menu in Solver

It is likely that many readers have used or tried to use Solver in the past and have managed fairly
well. It is also likely that the reader has clicked on the Options button and didn't quite understand the
meaning of all the settings. Furthermore, many readers may have found the explanations in the help file
to be rather cryptic, so we will explain the various options.
We have already explained in previous sections how to use the general Solver menu. Now we will
focus on the menus that appear under the Options buttons. To get there, select Tools -+ Solver and then
click the Options button. The menu in Figure 16.4 should be displayed.
We briefly describe the meaning of each option below:
The Load Model and Save Model buttons enable the user to recall and retain model settings so
they don't need to be re-entered every time the optirnisation is run.

Max Dm:
Iteratons:

seconds

, 100

[
(

]

cancel

Load Model..

.

]

1

[

Figure 16.4 The Options menu in Excel's Solver.

zave Model,..

Chapter I 6 Optimisat~onin risk analysis

443

Max Time limits the time taken to find the solution (in seconds). The default 100 seconds should
be appropriate for standard linear problems.
Iterations restricts the number of iterations the algorithm can use to find a solution.
Precision is used to determine the accuracy with which the value of a constraint meets a target
value. A fractional number between 0 and 1. The higher the precision, the smaller the number (i.e.
0.01 is less precise than 0.0001)
Tolerance applies only to integer constraints and is used to estimate the level of tolerance (as a
percentage) by which the solution satisfying the constraints can differ from the true optimal value
and still be considered acceptable. In other words, the lower the tolerance level, the longer it will
take for the solutions to be acceptable.
Convergence applies only to non-linear models and is a fractional number between 0 and 1. If after
five iterations the relative change in the objective function is less than the convergence specified,
Solver stops. As with precision and tolerance, the smaller the convergence number, the longer it will
take to find a solution (up to Max Time that is).
Lowering precision, tolerance and convergence values will slow down the optimisation, but it may help
the algorithm to find a solution. In general, these defaults should be changed if Solver is experiencing
problems finding an optimal solution.

Assume Linear Model is a very important choice. If the optirnisation problem is truly linear, then
this option should be chosen because Solver will use the simplex method, which should find a
solution faster and be more robust than the default optimisation method. However, the function has
to be truly linear for this option to be used. Solver has a built-in algorithm that checks for linearity
conditions, but the analyst should not rely solely on this to assess the model structure.
When the option Show Iteration Results is selected, Solver will pause to show the result of each
iteration, and will require user input to reinitiate the next iteration. This option is certainly not
recommended for computing intensive optimisation.
When selected, Use Automatic Scaling will rescale the variable in cases where variables and results
have large differences in magnitude
Assume Non-Negative will bound to zero all the decision variables that have not been explicitly
constrained. It is preferable, however, to specify explicitly the variables boundaries in the model.
The Estimates section allows one to use either a Tangent method or a Quadratic method to estimate
the optimal solution. The tangent method extrapolates from a tangent vector, whereas quadratic is
the method of choice for highly non-linear problems.
The Derivatives section specifies the differencing method used to estimate partial derivatives of
objective and constraint functions (when differentiable, of course). In general, Forward should be
used for most problems where the constraint values change slowly, whereas the Central method
should be used when the constraints change more dynamically. The central method can be chosen
when solver cannot find improving solutions.
Finally, the Search section allows one to specify the algorithm used to determine the direction to
search at each iteration. The options are Newton, which is a quasi-Newton method to be used when
speed is an issue and computer power is a limiting factor, and Conjugate, which is the preferred
method when memory is an issue, but speed can be slightly compromised.

444

R~skAnalysis

Automating

Solver with Visual Basic for ~xcel@

One of the most powerful tools in Excel is the integration with Visual Basic for Applications (VBA). This
integration can also be extended to optimisation models with Solver. We will use the model presented
in section 16.3.1 to show how to automate solver in Excel. The steps are:
1. Record a macro using Tools -+ Macro + Record New Macro and name the macro accordingly
(e.g. "SolverRun").
2. Open the Solver form as previously explained and press Reset All to clear existing settings.
3. Repeat the steps followed to optimise the model (e.g. set objective function, decision variables and
constraints and click the Solver button).
4. Once Solver has found a solution, stop recording the macro by clicking on the small red square in
the macro toolbar, or by using Tools + Macro + Stop Recording.
5. Use the Forms Toolbar to add a button to the sheet.
6. Then assign the macro (e.g. "SolverRun") to the button by double clicking on it while in Design
Mode and typing "Call SolverRun" in the procedure. For example, assuming the button is called
CommandButtonl, the VBA procedure should look as follows:
Private Sub CommandButtonl-Click0
Call SolverRun
End Sub

7. Add a reference to Solver in Visual Basic by pressing Alt+Fl 1, and then in the Visual Basic menu
select Tools -+ References and make sure the box next to "Solver" is selected.
8. The VBA code for the recorded macro should look similar to the example below:
Sub SolverRun ( )
'This macro runs solver automatically
SolverOk SetCell:="$C$3",MaxMinVal:=l,
SolverAdd CellRef:="$C$2", Relation:=l,
SolverAdd CellRef:="$C$2", Relation:=3,
SolverOk SetCell:="$C$3", MaxMinVal:=l,
SolverSolve userFinish:= True
End Sub

ValueOf:="O", ByChange:="$C$2"
FormulaText:="Max"
FormulaText:="Min"
ValueOf:="On, ByChange:="$C$ZV

Notice we have added an extra line ' ' ~ o l v e r ~ o ~userFinish:=~rue"
ve
which suppresses the optimisation results from being shown at the end of each iteration. Now everything should be ready to use the
macro. Make sure to exit the Design Mode and click on the button. The resulting model is not shown
here but is provided for the user to explore.

16.4 Working Example: Optimal Allocation of Mineral Pots
1

I

This exercise is based on a simplified version of a real-life example taken from our consulting work.
A metallurgic company processes metal into 14 small containers called pots. The contents of the
pots are then split among four larger tubs which are then used to create the final metal product that is
sold. The resulting product receives a premium based on its level of purity (lack of unwanted minerals).

1

Chapter 16 Optimisation in risk analysis

445

Since the input ore is different from batch to batch, the impurity levels will likely be different. It is
then economically important to achieve certain purity level among batches while avoiding "bad" levels.
The goal of the model is to optimise the allocation of pot metal contents into tubs in order to achieve
a certain purity level in the final product.'
Note that, in reality, since the impurity level is estimated with samples, there is uncertainty about
the actual impurity level of each batch. The client required that one was, say, 90 % confident that the
concentration of each impurity in a tub was less than a certain threshold. Since speed was an important
issue for the client, we avoided simulation by using classical statistics estimates of a mean (Chapter 9)
to determine the 10th percentile of the uncertainty distribution for the true concentration in a tub.
For each pot, the variables are:
purity of metal A (in percentage of total weight);
purity of metal B (in percentage of total weight);
weight (in pounds).
As the reader may imagine, the plant's operations present several constraints to be modelled, which
are listed below:
1.
2.
3.
4.

A minimum of 1000 lb should be taken per pot.
The quantities taken from the pots are measured in discrete increments of 20 lb.
A maximum of five pots can be allocated to a given tub.
Pots can be only split in two parts (i.e. the contents of a pot cannot be split into three or four
different tubs).
5. The maximum metal tonnage taken from a pot is equal to the pot weight (obvious, but this needs
to be explicit in the model).
6. Every pot should be allocated into at least one tub (no "leftover" pots).
7. The maximum and minimum weights contained per tub are constrained (by certain values, for this
example assumed to be a minimum of 5000 lb and a maximum of 10 000 lb).
Given the number of constraints and possible combinations to be optimised, this model would be quite
complex to define in mathematical terms (especially when considering parameter uncertainty), and hence
a more practical approach is to use optimisation. For this particular example, we employed OptQuest
with Crystal Ball for its ease of use and connection with Excel, but other commercial spreadsheet add-ins
could be used to achieve similar results. OptQuest is used here for a deterministic model but handles
stochastic optimisation equally well.
One powerful feature of simulation optimisation is that complex constraints such as those imposed
in this model can be specified by removing those scenarios rather than including them explicitly in
the objective function. Such constraints are sometimes called simulation requirements. Although this
approach can be slower than incorporating the constraint directly in the model, it allows for very
complex interactions in the model. Also, the models can be significantly sped up by compiling many
input variables into only one requirement variable. Figure 16.5 shows the general structure of the model.
Cells with a grey background represent input variables (variables that are changed during the optimisation process), and cells with a black background are requirements that are used to set the model

' Another goal was to optimise for several purity levels by their dollar premiums, hut that is omitted here for simplicity.

446

Risk Analysis

Optimal purity
PurityA
0.050
Purity B
0.050

Output for objective Fx
PoWTub
1
2
3
4
5
6
7
8
9
10
11
12
13
14

1
0
0
1480
2660
0
0
0
1000
0
2660
0
1000
0
0

2
0
1500
1180
0
0
2660
2660
1660
0
0
0
0
0
0

3
2660
0
0
0
2660
0
0
0
2660
0
0
1660
0
0

4
0
0
0
0
0
0
0
0
0
0
2660
0
2660
2660

Pass
requirements?
1
1
1
1
1
1
1
1
1
1
1
1
1
1

Total

IA
B
Good tub?

J17
H21:H25
H26:H34
121:J25
126:J34
K21:K25
K26:K34
H36:K36
H38:K38
H39:K39
H40:K40
M21:M34
M36
N40

0.028635 0.02891
2
0- 0286
0.0434
1
- -- - 0.048933 0 045517 0.045155 0.1382
1
1
1
0

Formulae table
=SUM[J3:JIG)
=C21'C3
=IF(COUNTIF($H$21: H25,"=O")=2,SUM(H21 :K21)=J3),1,0)
=SUM(M21:M34)
=SUMPRODUCT(H40:1<4O,H36:K36)

Figure 16.5 The metallurgic optimisation model implemented in Excel.

Chapter 16 Optimisatlon in risk analysis

6-

' .D.-e-f-i -.
Decbbn
~
Varlabk: Cell C3
-- --- -

-.
..-

447

-

fa

.----

Figure 16.6 Dialogue to create decision variables in OptQuest with Crystal Ball.

constraints and define the objective function. The larger table in range G2:J17 contains the purity levels
for both minerals and weight for each pot. The small table on the right contains the target purity levels
that the model will optimise for.
The "Pounds" table (range B2:E16) contains input variables that are modified during the optimisation.
By selecting Define + Define Decision in Crystal Ball's menu, the user will see the form shown
in Figure 16.6 with the settings for cell C3. The variables are discrete and can only increment in steps
of 20 pounds (constraint 2), and are constrained to a fixed minimum of 1000 lb (constraint 1) and a
maximum equal to the total content in the pot, which will vary depending on the batch; hence, the
maximum value is linked to cell ~3~ which contains the pot weight. Similar variables are created for
each combination of pots in tub 1, the only difference being the cell reference to their maximum weight.
Decision variables are only needed for the first tub since the allocation for the other tubs is calculated
on the basis of the initial allocation to the first tub. Thus, the remaining cells in the "Pounds" matrix
are left empty or with a constant value of 1.
The "Switches" matrix (range B19:E34) contains input variables that can only take values of 0 or 1.
The set of input variables from the "Pounds" matrix is multiplied by the variables in the "Switches"
matrix to generate the output matrix "Output for objective Fx". Notice that, for the "Switch variables,
input variables are only needed for the first three tubs, because the fourth tub can be filled with what is
left in the pots after their content has been allocated to the other three.
The remaining components in the model are the constraints and objective function. As previously
mentioned, for this model some constraints are built into the simulation model, whereas others are set
as scenarios that cannot be included in the optimal solution. Hence, anything that does not meet certain
requirements is "tossed" from the set of possible options. The equation from pot 6, tub 1, in the output
matrix incorporates constraint 3 as follows:

which summarises into "if 5 pots have been allocated to tub 1, then do not allocate the product from
pot 6 into tub 1, otherwise, allocate the content defined in cell C8 (which is a decision variable as in
Figure 16.6) multiplied by cell C26". The first part of this equation limits a maximum of five pots to
For some reason unknown to this author, sometimes the cell reference in the decision variables may be lost after opening OptQuest,
and will be replaced by the last number in the cell, e.g. the maximum weight entered for the pot (we are using OptQuest with Crystal
Ball V. 7.3, Build 7.3.814). Readers should be aware of this issue when using this and other models with dynamic referencing on
decision variable parameters.

448

Risk Analysis

be allocated to the tub (constraint 3). The second part (multiplication of two cells) is used to make
sure that there is no bias in the order of the allocation of the pots to tubs (by using the binary decision
van'ables in the "Switch"matrix). The same logic is used for pots 7 to 14 in tub I.
For tubs 2 and 3, the equation for pot 6 is modified so that we add "if the remaining weight left in
the pot is less than ZOO0 pounds, do not allocate any metal to this tub (constraint I), otherwise, allocate
the remaining material from pot 6 into this tub". The subtraction from the total pot weight satisfies
constraint 5.
The reader will notice that, since we can only allocate one pot to one or two tubs (constraint 4),
there is no need for an input variable in columns D and E since the material allocated to tubs 2 to 4
is dependent on whether tub 1 received material from a given pot. Thus, the podtub combinations for
tubs 2 to 3 in the "Pounds" matrix contains only Is, so a 1 is returned when multiplied by Is from the
"Switch matrix.
Finally, metal from the pot that has not been allocated to tubs 1 to 3 (and that is at least 1000 pounds)
is allocated to tub 4. As for the other tubs, formulas from pot 6 onwards are constrained, so no more
than 5 pots can be allocated to one tub.
We cannot waste any remaining material in a pot, of course, so another exogenous constraint (requirement) that we add is that the sum of the pounds allocated from a pot should be exactly the same as the
total weight in that pot. In addition, we can include constraint 6 into the same requirement to speed up
the optimisation. The resulting formula (cell M21 shown, same for all pots) is

In other words, if the pot has not been allocated to more than two tubs, and the sum of the weights
allocated is equal to the weight of the pot, then return a 1, otherwise a 0. The same test is applied to
each pot. Therefore, to meet the conditions, the sum of cells M21:M34 (cell M36) should be exactly 14
because, if all pots "pass the test", each individual pot test should return a 1.
Some readers may wonder why constraint 6 was added into this formula although it was already
mentioned that, if nothing is allocated to tubs 1 to 3, the total weight is allocated to tub 4. In reality, the
constraint is not necessary but is left in the equation to exemplify how to combine several constraints
into one formula, making the model computations significantly faster. Also, when a model is going to be
continuously modified, it is always good to have logical checks to make sure the algorithm is working
the way it is supposed to.
Before we include the final values in the objective function, we need to identify which tubs are lower
than the desired threshold of purity for minerals A and B. The formula we use for this is (cell H40:K40,
H40 shown):
= IF(AND(H38 < OptA, H39 < OptB), 1,O)

where O p t A and O p t 3 are the optimal purity levels for metals A and B respectively.
This formula returns a 1 when the requirements are met. Finally, the objective function is contained
in cell N40 and is the sum of the total weights per tub, multiplied by the "Good tub" indicator. The
optimisation model will try to maxirnise the value of this objective function (the total weight of "good
metal in tubs).
Once the variables, constraints and objective functions are defined, the last step left is to use OptQuest
to set up and then run the optirnisation procedure. To do so, in the Crystal Ball menu, select Run +
OptQuest + and open a new optimisation file. All variables in the Decision Variables form should be
selected. In the Forecast Selection form the inputs should be selected as in Figure 16.7 below.

Chapter 16 optirnisation in risk analysis

449

Figure 16.7 Forecast selection menu in OptQuest for the metallurgic optimisation model.

The objective function is maximised (we want to have the maximum amount of pure metal), the
constraint tests should be equal to 14 and the minimum and maximum contents of the tubs should be
5000 and 10 000 Ib respectively (constraint 7). The software will discard any scenario that does not
meet the requirements, and the objective function will be maximised by finding the best combination
of input variables.
Provided the initial values are reasonable, an optimal solution takes less than an hour to run on a
modem PC, which is important because the production line has to run this model twice a day.

16.4.1 Uncertainty in the model
In the actual model for our client we included the uncertainty about the impurity concentrations. The
user set a required confidence level CL (e.g. 90 %), and the model optimised to produce "tubs" that had
less than the specified impurity level with this confidence. The amount of impurity is determined by
Weight of pot * Impurity concentration
The source of the uncertainty came from the uncertainty of the weight of a pot (mean p p and
standard deviation ap, lbs) and from the uncertainty of the impurity concentrations (mean p~ and
standard deviation CTA for impurity A, for example). The mean and standard deviation of the distribution
of the product (pp, ap) of these two random variables is given by

In order to calculate the impurity level at the required confidence, we use Excel's NORMINV(CL,
pp, ap) function. The normal approximation is a reasonable approximation in this case because the
uncertainty of the concentration was close to a normal distribution and was greater than the weight
uncertainty so dominated the shape of the product. As mentioned before, finding a way of avoiding
having to optimise a simulation model (rather than the calculation model here) is very helpful because
it speeds up the optimisation time hugely: one calculation replaces a simulation of, say, 1000 iterations
to be sure of the required confidence level value.

Chapter 17

Checking and validating a model
In this chapter I describe various methods that can be used to help validate the quality and predictive
capabilities of a model. Some techniques can be carried out during a model's construction, which will
help ensure that the finished model is as free from errors and as accurate and useful as possible. Other
techniques can only be executed at a future time when some of the model's predictions can be compared
against what actually happened, but one may nonetheless devise a plan to help facilitate that comparison.
Key points to consider are:
Does the model meet management needs?
Is the model free from errors?
Are the model's predictions robust?
The following topics describe the methods we use to help answer these questions:
Ensuring the model meets the decision-makers' requirements.
Comparing predictions against reality.
Informal auditing.
Checking units propagate correctly.
Checking model behaviour.
Comparing results of alternative models.

1 7.1 Spreadsheet Model Errors
Your company may have hundreds or thousands of spreadsheet models in use. If even 1 % of these
have errors, you could be making many decisions based on quite inaccurate information. If you now
introduce risk analysis models using Monte Carlo simulation, which is more difficult to write (because
we have to write models that work dynamically) and to check (because the numbers change with each
iteration), the problem could get much worse.
Errors come in several forms:

Syntax errors where a formula is incorrectly put together. For example, you mismatch brackets,
forget to make a formula into an array formula (by entering with Ctrl
Shift + Enter instead of
Enter), use the wrong function, etc.
Mechanical errors which are hitting the wrong key, pointing to the wrong cell, etc. About 1 % of
spreadsheet cells contain such errors.
Logical errors which are incorrect formulae due to mistaken reasoning, misunderstanding of a
function or the appropriate use of probability mathematics. These errors are more difficult to detect
than mechanical errors and occur in about 4 % of spreadsheet cells in normal (unrisked) models.

+

,
11
I

452

Risk Analysis

Application errors where the spreadsheet function does not perform as it should. Excel generates
incorrect results for some statistical functions: GAMMADIST and BINOMDIST are awful, for
example. Some versions of Excel also don't automatically update all formulae correctly - use Ctrl
Alt
F9 instead of F9 to be sure it updates correctly. Random number generation for certain
distributions is quite numerically difficult, so you will see artificial limits to the parameters allowed
for distributions in a lot of software: @RISK, for example, allows a maximum of 32 767 trials in
a binomial distribution and for a hypergeometric population, while Crystal Ball allows a maximum
of 1000 for a Poisson mean and parameters for the beta distribution must lie on [0.3, 10001. It is
frustrating, of course, to have to work around such limits, and often you'll only find them because
the model didn't work for some iterations, so we have designed ModelRisk to have no such issues.
Omission errors where a necessary component of the model has been forgotten. These are the most
difficult errors to detect.
Administrative errors, for example using an old version of a spreadsheet or graph, failing to update
a model with new data, failing to get the spreadsheet to recalculate after changes, importing data
from another application incorrectly, etc.

+

+

We have tried to help reduce the frequency of these types of error with ModelRisk. Each function
returns an informative error message when inappropriate parameter values are entered. For example:
= VoseNormal(100, -10) returns "Error: sigma must be >= 0" because a standard deviation cannot be negative.
= VoseHypergeo(20, 30, 10) returns "Error: n must be <= M" because one cannot take more
samples without replacement (n = 20) than there are individuals in the population (M = 10).
{= VoseAggregateMoments(VosePoissonObject(10), VoseLognormal(10, 3))) returns "Error: Severity distribution not valid" because the severity distribution needs to be an object, e.g. VoseLognormalObject(10, 3)

If you write any user-defined functions, for which the Excel user will be less familiar, please consider
doing the same.
In ModelRisk we have also chosen to return pedantically correct answers for probability calculations,
for example:
= VoseHypergeoProb(2, 10, 25, 30, 0) returns 0: this is the probability of observing exactly two

successes where the minimum possible is 5. If it's impossible, the probability is zero.
= VoseBinomialProb(50, 10, 0.5, 1) returns 1: the probability of observing less than or equal to 50
successes when there are only 10 trials.
This means that you don't have to write special code to get around the function giving errors. For
example, the Excel equivalent formulae are:
= HYPGEOMDIST(2, 10,25,30) returns #NUM!
= BINOMDIST(50, 10,0.5, 1) returns #NUM!

You also need to check how your Monte Carlo simulation software handles special cases for particular
values. Poisson(O), for example, means that the variable can only be zero. In a simulation model, it

Chapter 17 Checking and validating a model

453

would be perfectly reasonable for a cell simulating a concentration to produce a zero value that fed into
a Poisson distribution. However, software will handle this differently:
@RISK: = RiskPoisson(0) returns #VALUE!
Crystal Ball: = CB.Poisson (0) returns #NUM!
ModelRisk: = VosePoisson(0) returns 0
Perhaps the most' useful error-reducing feature in ModelRisk is that we have interfaces that give a
visual explanation and check of most ModelRisk features. For example, a cell containing the formula
= VoseGammaProb(C3:C7, 2, 3, 0) returns the joint probability of the values in cells C3:C7 being
randomly generated from a Gamma(2, 3) distribution. Selecting the cell with this formula and then
clicking ModelRisk7sView Function icon pulls up the interface shown in Figure 17.1.
Crystal Ball and @RISK both have very good interfaces, although these are limited to input distributions only.
A quick Internet search for "spreadsheet model errors" will provide you with a wealth of individuals
and organisations who research into the source and control of spreadsheet errors. For example, the
European Spreadsheet Risks Interest Group is dedicated to the topic. Raymond Panko from the University

~ ~ ~ Location.
p u t

a
I

Figure 17.1 Visual interface in ModelRisk for the formula VoseGammaProb(C3:C7,2,3,0).

454

Risk Analysis

of Hawaii is a leader in the field and provides an interesting summary of spreadsheet error rates and
reasons at http://panko.shidler.hawaii.edu/SSR/index.htm.
Looking at the error percentages, for large models the question is not "Are there any errors?'but
"How many errors are there?'. A company can help minimise model errors by establishing and enforcing a policy for model development and for model auditing. Dr Panko reports the recommendations
of professional model auditors that one should spend 113 of the development time in checking the
model.

17.1.1 Informal auditing
Studies have shown that the original builder of a spreadsheet model has a lower rate of error detection
than an equivalently skilled coworker. It's not so surprising, of course, since we are more inclined than
a reviewer to repeat the same logical, omission and administrative errors.
At Vose Consulting we do a lot of internal auditing. An important part of the process is sitting down and explaining to another analyst the decision question(s) and the model structure with
pen and paper and then how we've executed it in a spreadsheet. Just the process of providing an
explanation will often lead to finding errors in your logic, or to finding simpler ways to write the
model.
Get another analyst to go through your code with the objective of finding your errors, so that a
successful exercise is one that finds errors rather than one that pronounces your model to be error free.
Having several analysts look at your model is even better, of course - it is interesting how people find
different errors. For example, in writing our software, some of our team are just great at finding numerical bugs, others at wrong formulae, others still at finding inconsistencies in structure or presentation.
Different things jump out at different people.

17.1.2 Checking units propagate correctly
I studied physics at university, and one of the first things you learn to do is a "dimensional analysis" of
formulae. For example, there exists an equation relating initial speed u and final speed v to the distance
s over which a body has constant acceleration a:

The dimensions involved are length L (in metres, for example) and time T (in seconds, for example).
.
the elements
Distance has units L , speed has units L I T , and acceleration has units L / T ~ Replacing
in the above formulae with their dimensions gives

(;)'

=

(;)'+ (f)*

L

You can see that the left- and right-hand sides of the equation have the same units and that, when we
add two things together, they have the same units too (so we are not adding "apples and oranges"). In
a spreadsheet model we can use the same logic to help make sure our model is constructed properly.

Chapter 17 Checking and validating a model

455

It is good practice to label cells containing a number or formula with some explanation of what that
value represents, but including units makes the logic of the model even clearer; for example, noting
the currency when there is more than one in your model, or, if it is a rate, then noting the denominator, e.g. "$US/ticket", or "cases/outbreak. Then checking that the units flow through the model using
dimensional analysis will often reveal errors.
Checking that the same units are used for a dimension (length, mass, etc.) is also important. We
commonly come across two problems in this category in our auditing activities that are easily avoided:
Fractions. The first is the use of a fraction, where the modeller might label a cell "Interest rate ( %)"
and then write a value like "6.5". Of course, to apply that interest rate, s h e will have to remember
to divide by 100 to get to a percentage, and we've found that this is sometimes forgotten. Better by
far, in our view, is to label a cell "Interest rate" and input the value "6.5 %" which will show on
screen as 6.5 % but will be interpreted by Excel as 0.065 and can therefore be used directly.
Thousands, millions, etc. In large investment analyses, for example, one is often dealing with very
large numbers, so the modeller finds it more convenient to use units of thousands or millions. This
would not present a problem if the entire spreadsheet used the same units, but very commonly there
will be certain elements that do not; for example, costlunit or pricelunit for a manufacturer or retailer
of high-volume products. The danger is that in summary calculations that evaluate cashflow streams,
the modeller may forget to divide by 1000 or 1000000, in keeping with other currency cells. Even
if it is all done correctly, it is more difficult to follow formulae where "11000 and "* 1000000"
appear without explanation.
Our preference is that the model be kept in the same units throughout, a base currency unit, for
example, like $, E or &. Admittedly this can be tricky if you're converting from values you know in
thousands or millions - we can easily get all those zeros mixed up. A convenient way to get around this
in Excel is to use special number formatting. We use a few formats in particular, employing Excel's
Format1Cells 1 Custom feature:

which will display 1 234 567 890 as S123.5M;

which will display 1 234 567 890 as &123.5M as above, but will display negative values in red;

which does the same as the second option but has the "EMnext to the numbers rather than left justified;

which will display 1 234 567 890 as £123,456.8k
You can, of course, substitute a different currency symbol.

458

Risk Analysis

time series summary plots;
correlation and regression statistics.
They are discussed at length in Chapter 5.

1 7.2.5 Stressing parameter values
A very useful, simple and powerful way of checking your model is to look at the effect of changing
the model parameters. We use two different methods.
Propagate an error

In order to check quickly what elements of your model are affected by a particular spreadsheet cell,
you can replace the cell contents with the Excel formula: =NA(). This will show the warning script
"#N/AU(meaning data not available) in that cell and any other cell that relies on it (except where the
ISNA() or ISERROR() functions are used). Imbedded Excel charts will simply leave the cell out. I like
this method very much because it is quicker than using the Excel audit toolbar to trace dependents and
it also works when you have VBA macros that pick up values from cells within the code, i.e. when
the cells aren't inputs to the macro function the Trace Dependents function in Excel won't work in that
situation.
Set parameter values t o extremes

It is difficult to see whether your Monte Carlo simulation model is performing correctly for lowprobability outcomes because generating scenarios on screen will obviously only rarely show those
low-probability scenarios. However, there are a couple of techniques for concentrating on these lowprobability events by temporarily altering the input distributions. We suggest that you first resave your
model with another name (e.g. append "test" to the file name) to avoid accidentally leaving the model
with the altered distributions. You can generate model extremes as follows:
(a) Set a discrete variable to an extreme instead of its distribution. The theoretical minimum and
maximum of discrete bounded distributions are provided in the formulae pages for each distribution
in Appendix 111. Many distributions have a zero minimum, but only a few distributions have a
maximum value (e.g. binomial). In general, it is not a good idea to stress a continuous variable with
its minimum or maximum, however, because such values have a zero probability of occurrence
and so the scenario is meaningless.
(b) Modify the distribution to generate values only from an extreme range. This is particularly useful
for continuous distributions, and for discrete distributions where there is no defined minimum
andlor maximum. Monte Carlo Excel add-ins normally offer the ability to bound a distribution.
For example, in ModelRisk we can write the following to constrain a lognormal distribution:
Only values above 30: = VoseLognonnal(l0,5,, VoseXBounds(30, ))
Only values below 5: = VoseLognonnal(l0,5,, VoseXBounds(, 5))
Values between 10 and 11: = VoseLognormal(lO,5,, VoseXBounds(10, 11))

Chapter 17 Checking and validating a model

459

In @RISK, this would be
= RiskLognonn(0, 5, RiskTruncate(30, )), etc.

In Crystal Ball you apply bounds in the visual interface. Note that occasionally a model will have
an acute response to a variable that is within a small range. For example, a model of the amplitude
of vibrations of a car may have a very acute (highly non-linear) response to an input variable
modelling the frequency of an external vibrating force, like the bounce from driving over a slatted
bridge, when that frequency approaches the natural frequency of the car. In that case, the rare event
that needs to be tested is not necessarily an extreme of the input variable but is the scenario that
produces the extreme response in the rest of the model.
(c) Modih the Probability of a Risk Occurring. Often in a risk analysis model we have one or more
risk events. We can simulate them occurring (with some probability) or not in a variety of ways.
We can stress the model to see the effect of an individual risk occurring, or a combination of risks,
by increasing their probability during the test. For example, setting a risk to have 50 % probability
(where perhaps we actually believe it to have 10 % probability) and generating on-screen scenarios
allows us comfortably to watch how the model behaves with and without the risk occurring. Setting
two risks each to a 70 % probability will show both risks occurring at the same time in about 50 %
of the scenarios. etc.

17.2.6 Comparing results of alternative models
There are often several ways that one could construct a Monte Carlo model to tackle the same problem.
Each method should give you the same answer, of course. So, if you are unsure about one way of
manipulating distributions, then try it another (perhaps less efficient) way and see if the answers are the
same.
The more difficult area is where you may feel that there are two or more completely different stochastic
processes that could explain the problem at hand. Ideally, one would like to be able to construct both
models and see whether they come up with similar answers. But what do we mean by similar? In
fact, from a decision analysis point of view we don't actually mean that they come up with the same
numbers or distributions: we mean that, if presented with either result, the decision-maker would make
the same decision. If we do have the luxury of being able to construct two completely different model
interpretations of the world, we may be able to use a technique called Bayesian model averaging that
weights the likelihood of each model on the basis of how probable they would make our observations.
We nearly always will not have the luxury of being able to model two or more different approaches
to the same problem because of time and resource constraints. If you are going to have to put all
your efforts into one model, try to make sure that your peers agree with your approach, and that the
decision-maker will be comfortable with making a decision based on the model's assumptions. The
decision-maker could prefer you to construct a model that may not be the most likely explanation for
your problem, but that offers the most conservative guidance for managing it.
Finally, simple "back-of-the-envelope" checks can also be useful. Managers will often look at the
results of a risk analysis and compare with their gut feeling andlor a simple calculation. It is surprising
how often a modeller can get too involved in the modelling and pay too little attention to the numbers
that come out at the end.

460

R~skAnalys~s

1 7.3 Comparing Predictions Against Reality
In many cases, this might be akin to "shutting the stable door after the horse has bolted. Clearly, if you
have made an irreversible decision on the basis of a risk assessment, this exercise may be of limited
value. However, even when that is true, analysing which parts of the model turned out to be the most
inaccurate will help you focus in on how you might improve your risk models for the next decision, or
prepare you for how badly you will have got it wrong.
Perhaps it is possible to structure a decision into a series of steps, each informed by risk analysis, so
that at each step in the series of decisions the risk analysis predictions can be compared against what
has happened so far. For example, setting up an investment that started with a pilot roll-out in a test
market would let a company limit the risks and at the same time evaluate how well it had been able to
predict the initial level of success.
Project risk analysis models, in which the cost and duration of the elements of a project are estimated,
are an excellent example of where predictions can be continuously compared with reality. The uncertainty
of the cost and time elements can be updated as each task is being completed to estimate the remaining
duration and costs, while a review of each task estimate against what actually happened can give you
a feel for whether your estimators have been systematically pessimistic or optimistic. Chapter 13 gives
a number of techniques for monitoring and calibrating expert estimates.

Chapter 18

Discounted cashflow modelling
A typical discounted cashflow model for a potential investment makes forecasts of costs and revenues over the life of the project and discounts those revenues back to a present value. Most analysts start with a "base case" model and add uncertainty to the important elements of the model.
Happily, the mathematics involved in adding risk to these types of model is quite simple. In this
chapter, I will assume that you can build a base case cashflow model that will look something like
Figure 18.2 and I will focus on the input modelling elements of Figure 18.1 and some financial outputs.
There are a number of topics that are already well covered in this book:

Expert estimates. In capital investment models we rely a great deal on expert judgement to estimate
variables like costs, time to market, sales volumes, discount levels, etc. Chapter 14 discusses how
to elicit estimates from subject matter experts.
Fitting distributions to data. We don't usually have a great deal of historic data to work with in
capital investment projects because the investment is new. I have worked with a very successful
retail company that investigates levels of pedestrian traffic at different locations in a town where it
is considering locating a new outlet. It has excellent regional data on how that traffic converts to
till receipts. That is quite typical of the type of data one might have for a cashflow analysis, and I
will go through such a model later in this chapter. Hydrocarbon and mineral exploration will generally have improving levels of data about the reserves, but have specialised methods (e.g. Krieging)
for statistically analysing their data, so I won't consider them further here. Otherwise, Chapter 10
discusses distribution fitting in some detail.
Correlation. Simple forms of correlation modelling - recognising that two or more variables are
likely to be linked in some way - are very important in cashflow models. The correlation techniques
described in Sections 13.4 and 13.5 are particularly useful in cashflow models.
Time series. Chapter 12 deals with many different technical time series models. GBM, seasonal and
autoregressive models are useful for modelling inflation, exchange and interest rates over time in a
cashflow model. Lead indicators can help predict market size a short time into the future. In this
chapter I consider variables such as demand for products and sales volumes that are generally built
on a more intuitive basis.
Common errors. Risk analysis cashflow models are not generally that technically complicated, but
our reviews show that the types of error described in Section 7.4 appear very frequently, so I very
much encourage you to read that section carefully. The rest of Chapter 7 offers some ideas on model
building that are very applicable to cashflow models.

Figure 18.1 Modelling elements in a capital investment discounted cashflow model.

Caah Flow
Tdal Revenue
CCA d Gmds Sdd
G m Maan
Operabng Ewnses
Earn@ BeforeTaxes
Tax Basis
l m e Tax
Net l m e

-

$
$
$
$
$

S

-

$

$
$

172,603 $
(172,603) $
(172.603) $
(172,603) $

$

$

174.041 $
(174.041) $

206.366
86,234
122,154
64.521
37,633
(309.011l

---.'

(346,644) $

3............-.3-------$

$

1174,041) $

$
$

$
$
$
$

205.723
65,132
120.592
55.~0
65.592
(243.419)

216,537 $ 239.116 $
69,606 $ 96,950 $
126,930 S 140.166 $
S 20.~0$ 20.~0$
$ 106.930 $ 120.166 $
$ (136,469) $ (16.323) $
$
$

$

317.672
131.540
166.331
20,m
166.331
150.036

$ 363.047 $ 423.456

$ 500.403
$ 150.235 $ 175,234 $ 207.075
$ 212.812 $ 246,224 $ 293.326

2 5 . ~ 0$ 2 5 . ~ 0 $ 2 5 . 0 ~
$ 187,612 $ 223,224 $ 266,328
$ 187,812 $ 223.224 $ 266.328
$

2 ---.--L-..-4 ------:
.-.$. -----:--J--69.W4.6--8.6.6394
5-1?2.F!?

.s..s.s.s.s:

37,633 $

65.592 $

106,930 $ 120,186' $

E123.43!

97.326 $ 101,419' $ 120.541--5 144.697-

Markst Ccndltl0M
Number d C a n p e h
UnR Casi
InRaScn Rate
Tax Rate

o

0

0
$23

46%

46%

46%

I
23
2

1
$24
4796
4636

1
$25
4746
46%

1
$27
55%
46%

1
$28
56%
46%

2
530
6.0%
46%

2
$32
6.1%
4696

2
$34
56%
46%

$61

$M

$66

$72

$TI

$61

I

Saw Aahrlty

SaksPh
Marketdurn
SalesVdume

$56
3,736
3,736

$56

5.903
3.542

4,697
3,523

9.323
4.662

7,419
3,709

11,716
5.021

14.724
5.522

18.504
6,166

Produalon Expense

27

s
&

ProdMDevebpmeW
CabtalEwnses

$
$

0Eme.ad

S

TdalEx~e~s

$

$

19.041 $
145.~0 $
10.WO$

9,521 $
55,003 $
20,OW$

3 5 . ~ 0$
20,WOS

20,WOS

20,WOS

$
.
$
$
$
$
.
$
.
20,WOS25,WO$25,003$25,OW

172,603 $

174,041 $

64,521 $

55,WO $

20,WO $

20,WO $

20,WO $

47,603 5
125.~0 $

-

-

$

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
.
d
d
d
d
d
d
d
.
d
d
d
d
d
d
.
d
d
d
d
d
d
.
.
.
.
.
.
.
d
d
.
d
.
d
.
d
d
d
d
d
d
.
d
.
d
.
d
.
.
.
.
~
.
~
~
.
~
~
~
~
~
~
d
d
d
.
d
~
~
~
~
~
~
~
~
~
~
.
.
.
~
.
.
.
~
~
d
d
d
d
d

-

-

$

$

-

$
$

31

Figure 18.2 A typical, if somewhat reduced, discounted cashflow model.

25,WO $

25,WO $

25.003

Chapter 18 Discounted cashflow modelling

463

18.1 Useful Time Series Models of Sales and Market Size
18.1.1 Effect of an intervention at some uncertain point in time
Time series variables are often affected by single identifiable "shocks", like elections, changes to a
law, the introduction of a competitor, the start or finish of a war, a scandal, etc. The modelling of the
occurrence of a shock and its effects may need to take into account several elements:
when the shock may occur (this could be random);
whether this changes the probability or impact of other possible shocks;
the effect of the shock - magnitude and duration.
Consider the following problem. People are purchasing your product at a current rate of 88/month,
and the rate appears to be increasing by 1.3 saleslmonth with each month. However, we are 80 % sure
that a competitor is going to enter the market and will do so between 20 and 50months from now. If
the competitor enters the market, they will take about 30 % of your sales. Forecast the number of sales
there will be for the next 100months.
Two typical pathways for this problem are shown in Figure 18.3, and the model that created them
is shown in Figure 18.4. The Bernoulli variable returns a 1 with 80% probability, otherwise a 0. It
is used as a "flag", the 1 representing a competitor entry, the 0 representing no competitor. Other
cells use conditional logic to adapt to the scenario. You can use a Binomial(1, 80%) if your software does not have a Bernoulli distribution. In Crystal Ball this is also called a Yes:No distribution.
The Stepuniform generates integer values between 20 and 50, and cell E4 returns the month 1000
if the competitor does not enter the market, i.e. a time beyond the modelled period. It is a good
idea if you use this type of technique to make such a number very far from the range of the modelled period in case someone decides to extend the period analysed. A Poisson distribution is used to
model the number of sales reflecting that the sales are independent of each other and randomly distributed in time. The nice thing about a Poisson distribution is that it takes just one parameter - its
mean, so you don't have to think about variation about that mean separately (e.g. determine a standard
deviation).

1

Sales each month

1

Figure 18.3 Possible pathways generated by the model depending on whether the competitor enters the
market.

464

Risk Analysis

A

1

B

I

C

I

D

E

IF

30%
88.00

9

Month

10
11
12
2
14
15
108
109

1
2
3
4
5
6
99
100

Expected sales

89.30
90.60
91.90
93.20
94.50
95.80
216.70
218.00

Sales fraction
lost
0
0
0
0
0
0
0.3
0.3

Sales

79
111
85
103
99
97
159
153

Formulae table
=VoseBernoulli(E2)
=IF(E3=1,VoseStepUniform(20,50),1000)

117

Figure 18.4

Model of Poisson sales affected by the possible entry of a competitor.

18.1.2 Distributing market share
When competitors enter an established market they have to establish the reputation of their product and
fight for market share with others that are already established. This takes time, so it is more realistic to
model a gradual loss of market share to competitors.
Consider the following problem. Market volume for your product is expected to grow each year by
(10 %, 20 %, 40 %) beginning next year at (2500, 3000, 5000) up to a maximum of 20000 units. You
expect one competitor to emerge as soon as the market volume reaches 3500 units in the previous year.
A second will appear at 8500 units. Your competitors' shares of the market will grow linearly until you
all have equal market share after 3 years. Model the sales you will make.
Figure 18.5 shows the model. It is mostly self-explanatory. The interesting component lies in cells
FlO:LlO, which divides the forecast market for your product among the average of the number of competitors over the last 3 years and yourself (the "1" in the equation). Averaging over 3 years is a neat way
of allocating an emerging competitor 113 of your market strength in the first year, 213 in the second and
equal strength from the third year on - meaning that they will then sell as many units as you. What
is so helpful about this little trick is that it automatically takes into account each new competitor and
when they entered the market, which is rather difficult to do otherwise. Note that we need three zeros
in cells C8:E8 to initialise the model.

Chapter 18 Discounted cashflow modelling

9
10
11
15
16
17
18
19

Market volume
Sales Volume

F8:L8
F9:L9
F10:LlO (output)

2,775
2,775

3,449
3,449

4,286
4,286

5,326
3,995

6,619
3,971

8,225
4,112

10,221
5,111

465

12,702
5,444

Formulae table
=VosePERT(10%,20%,40%)
{O,O,OI
=IF(E9>$C$4,2,IF(E9>$C$3,1,0))
=VosePERT(2500,3000,5000)
=MIN(20000,E9*(1+$C$5))
=ROUND(E9/(AVERAGE(C8:E8)+1),O)

Figure 18.5 Model of sales where the total market is shared with new-entry competitors.

18.1.3 Reduced sales over time to a finite market
Some products are essentially a once-in-a-lifetime purchase, e.g. a life insurance, big flat-screen TV,
a new guttering system or a pet identification chip. If we are initially quite successful in selling the
product into the potential market, the remaining market size decreases, although this can be compensated
for to some degree by new potential consumers. Consider the following problem: There are currently
PERT(50000, 55 000, 60000) possible purchasers of your product. Each year there will be about a
10 % turnover (meaning 10 % more possible purchasers will appear). The probability that you will
sell to any particular purchaser in a year is PERT(10%, 20%, 35 %). Forecast sales for the next
10 years.
Figure 18.6 shows the model for this problem. Note that C8:C16 is subtracting sales already made
from the previous year's market size but also adding in a regenerated market element. The binomial distribution then converts the current market size to sales. In the particular scenario shown in Figure 18.6,
the probability of selling is high (26 %), so sales start off high and drop off quickly as the regeneration
rate is so much lower (10 %). Note that some Monte Carlo software cannot handle large numbers of trials
in their binomial distribution, in which case you will need to use a Poisson or normal approximation
(Section 111.9.1).

18.1.4 Growth of sales over time up to a maximum as a function
of marketing effort
Sometimes we might find it easier to estimate what our annual sales will be when stabilised, but be
unsure of how quickly we will be able to achieve that stability. In this sort of situation it can be easier
to model a theoretical maximum sales and match it to some ramping function. A typical form of such

466

Risk Analysis

Figure 18.6 Model forecasting sales over time to a finite market.

a ramping function r (t) is

which will produce a curve that starts at 0 for t = 0 and asymptotically reaches 1 at an infinite value o f t ,
but reaches 0.5 at t = tl12. Consider the following problem: you expect a final sales rate of PERT(1800,
2300, 3600) and expect to achieve half that in the next PERT(3.5, 4, 5) years. Produce a sales forecast
for the next 10 years.
Figure 18.7 provides a solution.

18.2 Summing Random Variables
Perhaps the most common errors in cashflow modelling occur when one wishes to sum a number of
random costs, sales or revenues. For example, imagine that you expect to have Lognorma1(100000,
25 000) customers enter your store per year and they will spend $Lognormal(55, 12) each - how would
you estimate the total revenue? People generally write something like

using the ROUND function in Excel to recognise that the number of people must be discrete. But
let's think what happens when the software starts simulating. It will pick a random value from each

Chapter 18 Discounted cashflow modelling

-

A

~

B

I

C

I

D

I

E

1
2
--

3

4

-[years
Maxsales

5
6
7

2

2000

-

1500

-

1000

-

500

-

0
m

8

9
10
2
12
2
2
2
3
1
7
18
19

2500

-

(I)

0

I

F

I

G

I

H

I

I

I

J

I

K

I

467

L

1

0

2

4

6

8

10

Year
Formulae table

=VosePERT(3.5,4,5)
=VosePERT(I800,2300,3600)

21
22

Figure 18.7 Model forecasting ramping sales to an uncertain theoretical maximum.

distribution and multiply them together. Picking a reasonably high till receipt, the probability that a
random customer will spend more than $70, for example, is

The probability that two people will do the same is 11 % * 11 % = 1.2 %, and the probability that
thousands of people will spend that much is infinitesimally small. However, Equation (18.1) will assign
a 11 % probability that all customers will spend over $70 no matter how many there are. The equation is
wrong because it should have summed ROUND(Lognorma1(100000,25 OOO), 0) separate Lognormal(55,
12) distributions. That's a big, slow model, so we use a variety of techniques to shortcut to the answer,
which is the topic of Chapter 11.

18.3 Summing Variable Margins on Variable Revenues
A common situation is that we have a large random number of revenue items that follow the same
probability distribution but that are independent of each other, and we have independent profit margins
that follow another distribution that must be applied to each revenue item. This type of model quickly
becomes extremely cumbersome to implement because for each revenue item we need two distributions,
one for revenue and another for the profit margin, and we may have large numbers of revenue items. It
is such a common problem that we designed a function in ModelRisk to handle this, allowing you to
keep the model to a manageable size, speeding up simulation time and making the model far simpler
to review. Perhaps most importantly, it allows you to avoid a lot of conditional logic that it is easy
to get wrong.' Consider the following problem. A capital venture company is considering investing in

' I apologise if this comes across as a sales pitch for ModelRisk, but it is designed with finance people in mind.

~

M

468

Risk Analysis

a company that makes TV shows. They expect to make PERT(28, 32, 39) pilots next year which will
generate a revenue of $PERT(120, 150, 250)k each independently and from which the profit margin is
PERT(1 %, 5 %, 12 %). There is a 30 % chance that each pilot is made into a TV show in that country
running for Discrete((1, 2, 3, 4, 5},{0.4, 0.25, 0.2, 0.1, 0.05)) series, where each season of each series
generates $PERT(120, 150,250)k with margins of PERT(15 %, 25 %, 45 %). There is a 20 % chance that
these local series will be sold to the US, generating $PERT(240, 550, 1350)k per season sold, of which
the profit margin is PERT(65 %, 70 %, 85 %). What is the total profit generated from next year's pilots?
The problem is not technically difficult, but the scale of the modelling explodes very quickly. We
worked on the model for a real investment of this type and it had many more layers: pilots in several
countries, merchandising of various types, repeats, etc., and it took a lot of effort to manage. Figure 18.8
shows a surprisingly succinct model: rows 2 to 11 are the input data, rows 14 to 16 are the actual
calculations.
There are a few things to point out. In cell F2, 112 is subtracted and added to the minimum and
maximum estimates respectively of the number of pilots to give a more realistic chance of their occurrence after rounding. Distributions are input as ModelRisk objects in cells F3, F4, F6, F7, F8, F10
and F11 because we want to use these distributions many times. Cell C16, and elsewhere, uses the
Vose Sum Product function to add together revenue * margin for each pilot, where the revenue and

Pilots made
Series made
Seasons made
Profit

NA
NA
295

F2
F3 (F4, F7, Fl0, F l 1 similar)
F6
F14
El4
Dl4
D l 5:E15
C16
D l6
El6
F16 (output)

Local only
8
15
68 1

Local & US
series
3
9
3933

Total
11
4909

Formulae table
=ROUND(VosePERT(8-0.5,11,17+0.5),0)
=VosePE~TObject(l20,150,250)
=VoseDiscreteObject({l,2,3,4,5),{0.4,0.25,0.2,0.1,0.05))
=VoseBinomial(F2,F5)
=VoseBinomial(F14,F9)
=F14-El4
=VoseAggregateMC(D14,$F$6)
=VoseSumProduct(F2,F3,F4)
=VoseSumProduct(D15,F7,F8)
=VoseSumProduct(El5,F7,F8)+VoseSumProduct(El5,F10,F11)
=SUM(C16:El6)

Figure 18.8 Model forecasting profits from TV series.

Chapter 18 Discounted cashflow modelling

469

margin distributions are defined by the distribution objects in cells F3 and F4 respectively. Cell F14
simulates the number of pilots that made it to become series, from which the model determines how
many of those become series also sold into the US in cell E14, the difference being the number of
pilots that only became local series in cell D14. Setting up the logic this way ensures that we have a
consistent model: the local only and the US and local series always add up to the total series produced.
Cells Dl5 and El5 use the VoseAggregate(x, y) function to simulate the sum of x random variables all
taking the same distribution y defined as an object.

18.4 Financial Measures in Risk Analysis
The two main measures of profitability in DCF models are net present value (NPV) and internal rate
of return (IRR). The two main measures of financial exposure are value at risk (VAR) and expected
shortfall. Their pros and cons are discussed in Section 20.5.

18.4.1 Net present value
Net present value (NPV) attempts to determine the present value of a series of cashflows from a project
that stretches out into the future. This present value is a measure of how much the company is gaining
at today's money by undertaking the project: in other words, how much more the company itself will
be worth by accepting the project.
An NPV calculation discounts future cashflows at a specified discount rate r that takes account of:

1. The time value of money (e.g. if inflation is running at 4 %, £1.04 in a year's time is only worth
£1.OO today).
2. The interest that could have been earned over inflation by investing instead in a guaranteed investment.
3. The extra return that is required over parts 1 and 2 to compensate for the degree of risk that is being
accepted in this project.
Parts 1 and 2 are combined to produce the risk-free interest rate, rf. This is typically determined as
the interest paid by guaranteed fixed-payment investments like government bonds with a term roughly
equivalent to the duration of the project.
The extra interest r* over rf needed for part 3 is determined by looking at the uncertainty of the
project. In risk analysis models, this uncertainty is represented by the spread of the distributions of
cashflow for each period. The sum of r* and rf is called the risk-adjusted discount rate r .
The most commonly used calculation for the NPV of a cashflow series over n periods is as follows:

where Ciare the expected (i.e. average) values of the cashflows in each period and r is the risk-adjusted
discount rate.
NPV calculations performed in a risk analysis spreadsheet model are usually presented as a distribution
of NPVs because the cashflow values selected in the NPV calculations are their distributions rather than

470

R~skAnalys~s

their expected values. Theoretically, this is incorrect. Since an NPV is the net present value, it can have
no uncertainty. It is the amount of money at which the company values the project today. The problem
is that we have double-counted our risk by first discounting at the risk-adjusted discounted rate r and
then showing the NPV as a distribution (i.e. it is uncertain).
Two theoretically correct methods for calculating an NPV in risk analysis are discussed below, along
with a more practical, but strictly speaking incorrect, alternative:
Theoretical approach 1: Discount the cashJlow distributions at the risk-free rate. This produces
a distribution of NPVs at rf and ensures that the risk is not double-counted. However, such
a distribution is not at all easy to interpret since decision-makers will almost certainly never
have dealt with risk-free rate NPVs and therefore have nothing to compare the model output
against.
Theoretical approach 2: Discount the expected value of each cashjow at the risk-adjusted discount
rate. This is the application of the above formula. It results in a single figure for the NPV of
the project. A risk analysis is run to determine the expected value and spread of the cashflows
in each period. The discount rate is usually determined by comparing the riskiness associated
with the project's cashflows against the riskiness of other projects in the company's portfolio.
The company can then assign a discount rate above or below its usual discount rate, depending
on whether the project being analysed exhibits more or less risk than the average. Some companies determine a range of discount rates (three or so) to be used against projects of different
riskiness.

The major problems of this method are that it assumes the cashflow distributions are symmetric and
that no correlation exists between cashflows. Distributions of costs and returns almost always exhibit
some form of asymmetry, and in a typical investment project there is also always some form of correlation between cashflow periods. For example, sales in one period will be affected by previous sales,
a capital injection in one period often means that it doesn't occur in the next one (e.g. expansion of a
factory) or the model may include an autocorrelated time series forecast of prices, production rates or
sales volume. If there is a strong positive correlation between cashflows, this method will overestimate
the NPV. Conversely, a strong negative correlation between cashflows will result in the NPV being
underestimated. The correlation between cashflows may take any number of, often complex, forms. I
am not aware of any financial theory that provides a practical method for adjusting the NPV to take
account of these correlations.
In practice, it is easier to apply the risk-adjusted discount rate r to the cashflow distributions to produce a distribution of NPVs. This method incorporates correlation between distributions automatically
and enables the decision-maker to compare directly with past NPV analyses.
As I have already explained, the problem associated with this technique is that it will double-count
the risk: firstly in the discount rate and then by representing the NPV as a distribution. However, if
one is aware of this shortfall, the result is very useful in determining the probability of achieving the
required discount rate (i.e. the probability of a positive NPV). The actual NPV to quote in a report
would be the expected value of the NPV distribution.

18.4.2 Internal rate of return
The internal rate of return (IRR) of a project is the discount rate applied to its future cashflows such
that it produces a zero NPV. In other words, it is the discount rate that exactly balances the value of

ii

Chapter 18 Discounted cashflow modelling

471

all costs and revenues of the project. If the cashflows are uncertain, the IRR will also be uncertain and
therefore have a distribution associated with it.
A distribution of the possible IRRs is useful to determine the probability of achieving any specific
discount rate, and this can be compared with the probability other projects offer of achieving the target
discount rate. It is not recommended that the distribution and associated statistics of possible IRRs be
used for comparing projects because of the properties of IRRs discussed below.
Unlike the NPV calculation, there is no exact formula for calculating the IRR of a cashflow series.
Instead, a first guess is usually required, from which the computer will make progressively more accurate
estimates until it finds a value that produces an NPV as near to zero as required.
If the cumulative cashflow position of the project passes through zero more than once, there is more
than one valid solution to the IRR inequality. This is not normally a problem with deterministic models
because the cumulative cashflow position can easily be monitored and the smallest of any IRR solutions
selected. However, a risk analysis model is dynamic, making it difficult to appreciate its exact behaviour.
Thus, the cumulative cashflow position may pass through zero and back in some of the risk analysis
iterations and not be spotted. This can produce quite inaccurate distributions of possible IRRs. In order
to avoid this problem, it may be worth including a couple of lines in your model that calculate the
cumulative cashflow position and the number of times it passes through zero. If this is selected as a
model output, you will be able to determine whether this is a statistically significant problem and alter
the first guess to compensate for it.
IRRs cannot be calculated for only positive or only negative cashflows. IRRs are therefore not useful
for comparing between two purely negative or positive cashflow options, e.g. between hiring or buying
a piece of equipment.
It is difficult to compare distributions of IRR between two options unless the difference is very
large. Stochastic dominance tests (Section 5.4.5) will certainly be of little direct use. This is because a
percentage increase in an IRR at low returns (e.g. from 3 to 4 %) is of much greater real value than
a percentage increase at high returns (e.g. from 30 to 31 %). Consider the following illustration. 1 am
offered payments of £20 a year for 10 years (i.e. £200 total) in return for a single payment now. I am
asked to pay £200 - obviously a bad investment giving an IRR of 0 %. I negotiate to drop the price and
thereby produce a positive IRR. Figure 18.9 illustrates the relationship between the reduction in price
I achieve and the resulting IRR. The reduction in price I achieve is directly equivalent to the increase
in the present value of the investment, so the graph relates real value to IRR. As the saving I make

25%

4

K 20% --

'2

/

-IRR against real value

----- Fitted line at low IRR values

15% -10% --

I

0

25

50

75
Saving

100

125

150

Figure 18.9 An example of the non-linear relationship between IRR and real (or present) value.

472

Risk Analysis

approaches £200, the IRR approaches infinity. Clearly there is no straight-line relationship between IRR
and true value. It is therefore very difficult to compare the value of two projects in terms of the IRR
distributions they offer. One project may offer a long right-hand tail that can easily increase the expected
IRR, but in real-value terms this could easily be outweighed by a comparatively small diminishing of
the left-hand tail of the other option.

Chapter I 9

Project risk analysis
Project risk analysis is concerned with the assessment of the risks and uncertainties that threaten a
project. A "project" consists of a number of interrelated tasks whose aim is to produce a specific result
or results. Typically, a project risk analysis consists of analysing schedule and cost risk, although other
aspects like the quality of the final product are sometimes included. There will also often be an analysis
of the cashflow of the project, especially at the conception and bidding stages, but these are not discussed
here as a cashflow model is fairly simple to produce.
A cost risk analysis consists of looking at the various costs associated with a project, their uncertainties
and any risks or opportunities that may affect these costs. Risks and opportunities are defined as discrete
possible events that will increase and decrease the project costs respectively. They are both characterised
by estimates of their probability of occurrence and the magnitude of their impact. The distributions of
cost are then added up in a risk analysis to determine the uncertainty in the total cost of the project.
A schedule risk analysis looks at the time required to complete the various tasks associated with a
project, and the interrelationship between these tasks. Risks and opportunities are identified for each task
and an analysis is performed to determine the total duration of the project and, usually, the durations
until specific milestones within the project are achieved. A schedule risk analysis is generally more
complex to perform than a cost risk analysis because the logical connections between the tasks have to
be modelled in order to determine the critical path. For this reason, we will look at the elements of a
cost risk analysis first.
A project's cost and duration are, in reality, linked together. Tasks in a project are often quantified
by, among other things, the number of person-weeks (amount of work) needed to complete them. The
duration of the task is then equal to the person-weekslpeople on the job, and the cost equals [personweeks] * [labour rate]. Costs and durations are also linked if the model includes a penalty clause for
exceeding a deadline.
Cost elements and, particularly, schedule durations are also often correlated. Correlation, or dependency, modelling has been described in detail in Chapter 12 and will not be repeated here. However, it
is important to be aware that dependencies often exist in a risk analysis model, and failure to include
them in an analysis will generally underestimate the risk.
In this chapter, we will assume that a preliminary exercise has already been carried out to identify
the various risks associated with the project. We will assume that a risk register has been drawn up
(see Section 1.6) and that sufficient information has been gathered to be able adequately to quantify the
probabilities associated with each risk and the size of potential impact on the tasks of the project.
A project risk analysis is often completed after a more rudimentary (deterministic) analysis that uses
single-point estimates for each task duration and cost. A comparison of the results of this deterministic
analysis with those of the risk analysis, where distributions have been used to model uncertainty components, often surprises people. Somehow, one expects that a deterministic analysis based on values one
thinks most likely to occur should produce results that equate to the mode of the risk analysis output
distribution. In fact, it turns out that a risk analysis model will provide a mode and mean that are nearly

474

Risk Analys~s

always greater than the deterministic model result. Sometimes the risk analysis output distribution will
not even include the deterministic result! The main reason for this is that the distributions one assigns to
uncertainty components are nearly always right skewed, i.e. they have a longer tail to the right than to
the left. This is because there are many more things that can go wrong than go right, and because we are
always trying to place emphasis on doing the job as quickly and cheaply as possible. Thus, the model
distributions nearly always have more probability to the right of the mode than to the left, which means
that, in the aggregate, for most models one is much more likely to have a scenario that exceeds the
deterministic scenario. A schedule risk analysis will diverge even more from its deterministic equivalent
than a cost model because any task whose commencement depends on the finish of two or more other
tasks begins at the maximum of the samples from the distributions of finish dates of the other tasks, not
the maximum of their modes.

19.1 Cost Risk Analysis
A cost risk analysis is usually developed from a work breakdown structure (WBS) which is a document
that details, from the top down, the various work packages (WPs) comprising the project. Each WP
may then be subdivided into a bill of quantities and estimates of the labour required to complete
them.
There will usually be a number of cost items associated with each WP that have an element of
uncertainty. In addition, there may be discrete events (risks or opportunities) that could change the size
of these costs. The normal uncertainties in the cost items are modelled by continuous distributions like
the PERT or triangular distribution. I will use the triangular distribution for the rest of this chapter for
simplicity, but the reader should by now be aware (see Section 14.3.2) of the misgivings I have about
this distribution. The impact of the risks and opportunities will similarly be modelled by continuous
distributions, but whether they occur or not is modelled with a discrete distribution. To illustrate this,
consider the following example.
Example 19. I

A new office block has been designed to be roofed with corrugated galvanised steel at a cost of between
£ 165 000 and £ 173 000, but most probably £167 000. However, the council's planning department has
received a number of objections from local residents. The architect thinks there is about a 30 % chance
that the building will have to be roofed with slate at a cost of between £193 000 and £208 000, but most
likely about £198 000.
Figure 19.1 shows how to model the roofing's cost using triangular distributions. The model selects
a steel roof (cell C5) for 70 % of the scenarios and a slate roof (cell C6) for the remaining 30 % to
produce a combined uncertainty in cell C8. +
Many of the cost items for the project will be in the form of: x items @ Eylitem, where x and y are
both uncertain quantities. At first sight it seems logical simply to multiply these two variables together
to get the cost, i.e. cost = x . y. However, there is a potential problem with this approach (determining
the sum of random variables is discussed in detail in Chapter 11). Consider the following two examples.
Example 19.2

A ship's hull consists of 562 plates, each of which has to be riveted in place. Each plate is riveted by
one worker. The supervisor, reflecting on the efficiency of her workforce, considers that the best her
riveters have ever done is put up a plate in 3 h 45 min. On the other hand, the longest they have taken

Chapter 19 Proiect risk analysis

A1

I

B

1

I

D

I

E

F

G

I Hz

Model of the cost of the roof of the new office block

2

3
4
5

6

Galvanised steel roof
Slate roof

8
-

l~ombinedestimate

7

9
10,
2 C5:C6
12
13

C

475

C8 (output)

Distribution Minimum Most likely Maximum Probability
168,333 f 165,000 f 167,000 E 173,000
70%
198,000 f 193,000 f 198,000 E 203,000
30%
168,333

1

Formulae table
=VoseTriangle(D5,E5,F5)
=VoseDiscrete(C5:C6,G5:G6)

Figure 19.1 Cost model for Example 19.1.

is about 5 h 30min, and it is far more likely that a plate would be riveted in about 4 h 15 min. Each
riveter is paid £7.50 an hour. What is the total labour cost for riveting? One's first thought might be to
model the total cost as follows:
Cost = 562 * Triangle(3.75,4.25,5.5)

* £7.50

What happens if we run a simulation on this formula? In some iterations we will produce values close
to 3.75 from the triangular distribution. This is saying that all of the plates could have been riveted in
record time - clearly not a realistic scenario. Similarly, some iterations will generate values close to 5.5
from the triangular distribution - the scenario that the workforce took as long to put up every plate on
the ship as it took them to do the trickiest plate in memory.
The problem lies in the fact that the triangular distribution is modelling the uncertainty of an individual
plate but we are using it as if it were the distribution of the average time for 562 plates.
There are several approaches (Chapter 11) to the correct modelling of this problem. The easiest is
to model each plate separately, i.e. set up a column of 562 Triangle(3.75,4.25, 5.5) distributions, add
them up and multiply the sum by £7.50. While this is quite correct, it is obviously impractical to use a
spreadsheet model of 562 cells just for this one cost item, so the technique is only really useful if there
are just a few items to be summed, or one could use VoseAggregateMC(562, VoseTriangleObject(3.75,
4.25, 5.5)).
Another option is to apply the central limit theorem (see Section 6.3.3). The mean p and standard
deviation a of a Triangle(3.75, 4.25, 5.5) distribution are

Since there are 562 items, the distribution of the total person-hours for the job is given by
Total person-hours = Normal(4.5 x 562,0.368 x

= Norma1(2529,8.724)

Then, the total labour cost for riveting is estimated as
Cost = Normal(2529, 8.724) x £7.50

+

476

R~skAnalys~s

Once each cost item for the project has been identified, along with any associated risks and uncertainties, a model can be produced to estimate the total project cost. Figure 19.2 illustrates the sort of model
structure that could be used. In this example, each item is clearly defined, along with the assumptions
that are used in its estimation and the impacts and probabilities of any risks.
The results from a simulation of the total project cost can be represented in a number of ways, as
described in Chapter 5. An analysis of the project's costs will generally be used to produce figures for
the budget and contingency and, if the project is being commissioned by another organisation, a bid
price. Section 15.3.2 describes how these are determined.
One question that management will often ask is how the budget and contingency are distributed back
among the cost items. This knowledge will help the project manager to keep an eye on how the project
is progressing. My approach is to distribute back the budget and contingency costs so that the figures
associated with each cost item have the same probability of being exceeded. Using this approach will
give each cost item the same chance of coming in within its budget figure or its (budget + contingency)
figure and will avoid controllers of some cost items being given almost impossible targets to meet
and others easy targets. This method of distributing budget and contingency costs among cost items is
demonstrated in the following example, using the cost model from Figure 19.2.

WPl

Planning

WP

Earthworks

WP5

R3

Distribution Minimum Most likelv Maximum Probabilitv
f 17,267 f 15,500 f 17,200 f 19,100
0.6
Initial application refused f 19,767 f 18,400 f 19,700 f 21,200
0.4
WPl total f 17,267

f 76,667 f 72,000 f 74,500 f 83,500
Water table problem f 85,400 f 81,300 f 84,600 f 90,300
WP2 total f 76,667

If

57,533 f 56.300

If

47.533 f

E

57,200 f 59,100

44.300 f

46,700 f 51,600

Labour
f 34,200 f 32,300 f 33,500 f 36,800
First choice contractor not available f 41,200 f 38,300 f 41,200 f 44,100
WP5 total f 34,200

( W P ~ Road surfacing

If

31,633 f 29,600 f 31,200 f 34.100

If

22,933

22.000

f

22,700 f 24,100

If

11.633 f 11.000

E

11,500 f 12,400

f 299,400

Figure 19.2 Example of a project cost model structure.

f

0.8
0.2

0.75
0.25

Chapter 19 Project risk analysis

477

Example 19.3

Figure 19.3 shows the cumulative distribution of the project's total cost. The mean of the generated values, £303 856, is selected as the budget, and the (80th percentile - budget), i.e. £308 588 - £303 856 =
£4732, is selected as the risk contingency. The budget is then the cost that the organisation will realistically try to achieve or better, and the contingency is the additional amount put aside should the need
arise. If the cost risk model is accurate, there is a 53 % chance that the budget will be sufficient and an
80 % chance that the project's costs will not exceed the budget + contingency.
In order to be able to distribute the budget and contingency back among the cost items, each cost
item must be nominated as an output of the model. The generated data points from each cost item are
then output to a spreadsheet, and each column of values is then ranked separately in ascending order
(Figure 19.4). Then, the costs generated for all items in each row are summed to give a total project cost.

Figure 19.3 The cumulative distribution of total project cost for the model in Figure 19.2.

Iteration1
1
2

WPI
15567
15647

WP2
72193
72328

WP3
56321
56389

294
295
296

18634
18682
18688

78294,.-5'7i17
78327
57623
57626
78343

352
353
354

19'307 ..---:-- 7841
1931580180
57846
19333
80198
57850

WP4
44410
44536

WP5
32400
32442

WP7
22030
22061

WP8
11028
11049

31776
31784
31788

2

11678
11681
11681

303732
303845'
303895

11787
11790
11792

228360
308628.
308703

- -__ ..

Work package budgets
,

..

_--

47759
47763
47777

34975
'34984
34989

I

WP6
29660
29737

WPi
283609
284189
Project budget

M

23001P
23004

Work package budgets +contingencies
4834g
48347
48368

35781
35840
35849

4

$,

I

I

&34.
23167
32141'23168
32141
23173

Project budget +
contingency

...and their
Cost values for the work packages are ranked
in ascending order in each column....

totals are
calculatedfor
each row

Figure 19.4 Distributing the budget and (budget + contingency) values back among the project's individual
cost items for Example 19.3.

478

Risk Analysis

+

Next, the budget and (budget contingency) values are looked up in the column of summed costs
and the values most precisely equating to the calculated budget and (budget contingency) figures
are determined. The values that appear in the same row for each cost item are then nominated as the
budgets and (budget and contingency) values for the cost items. Figure 19.4 illustrates this process.
More iterations would allow item budget and (budget contingency) values to be determined more
precisely. +

+

+

19.2 Schedule Risk Analysis
Schedule risk analysis uses the same principles as cost risk analysis for modelling general uncertainty
and risks and opportunities. However, it must also cope with the added complexity of modelling the
interrelationships between the various tasks of a project. This section looks at the simple building blocks
that typically make up a schedule risk analysis and then shows how these elements are combined to
produce a realistic model.
A number of software tools allow the user to run Monte Carlo simulations on standard project planning
applications like Microsoft Project and Open Plan. However, these products do not have the flexibility to
model discrete risks and feedback loops, described below, which are common features of a real project.
For now, the most flexible environment for project schedule modelling remains the spreadsheet, and all
of the examples below are illustrated in this format.
A project plan consists of a number of individual tasks. The start and finish dates of these tasks can
be related in a number of ways:
One task cannot start until another has finished (the link is called finish-start or F-S). This is the
most common type of linking in project planning models.
2. One task cannot start until another task has started (start-start or S-S).
3. One task cannot start until another has been partially completed (start-start + lag or S-S +x).
4. One task cannot finish until another has finished (finish-finish or F-F).
5. One task cannot finish until another is a certain way from finishing too (finish-finish-lag or
F-F-x).
1.

Figure 19.5 shows how these interrelationships are represented diagrammatically. In the rest of this
chapter we will use the notation "(a, b, c)" to denote a Triangle(a, b, c) distribution. So: Lag(5, 6, 7)
weeks is a lag modelled by a Triangle(5, 6, 7) distribution in units of weeks, etc.
It is essential that a schedule risk model is set up to be as clear as possible, because it can easily become
rather complex. Figure 19.6 illustrates a useful format, where all of the assumptions are immediately
apparent.

Finish-Start

Start- Start+lag

Start-Start

Finish- Finish

Figure 19.5 Diagrammatic representation of the common types of task linking in project planning.

Chapter 19 Project risk analys~s 479

Project start date

29-Mar-10

Task1
Task2
Task3
Task4
Task5

Start
DateSt
29-Mar-10
2-Jun-10
2-Jun-10
30-Jun-10
2-Jun-10

Duration d (weeks)
Dist
Min ML Max
9.33
7
9 12
7.33
6
7
9
10.00
8
9 13
18.00
15 18 21
14.00
12 13 17

Dist

Formulae table
=VoseTriangle(Min,ML,Max)

C9

=C8+7'J9

H6
H7
H8
H9
H I0

=C6+7*D6
=C7+7'D7
=C8+7^C8
=MAX(C9+7'D9,H8)
=MAX(CIO+D10*7,H9-010'7)

Finish
Start
Start lag (weeks)
Finish
Finish lag (weeks)
DateFin
loqic
Dist Min ML Max
loqic
Dist Min ML Max
2-Jun-10 =start date
=Fin1
23-Jul-10
11-Aug-10
=St2
3-Nov-10 =St3+lag 4.00 2
3
7
=Fin3
8-Sep-10
=St2
=Fin4-lag 8.00 7
8
9

Figure 19.6 Example of a project schedule model structure.

Project stall date

29-Mar-10

29-Mar-10

9.67

Duration d (weeks)
Min
ML
8
9

+Task2 nsk 26-Jul-10

6.33

4

26-Jui-10

9.00

7

6

Max
12
9
9
12

Finish Probability
Start
Date Fin
4-Jun-10
=Start date
26-Jul-10
80%
=Fin1
8-SeplO
20%
=Fin2nomal
26-Jui-10
27-Sep-I0
=FinPtotal

Start lag (weeks)

Finish

Finish lag (weeks)

Figure 19.7 Modelling schedule risk as additional duration.

More complex relationships involving several tasks can now be constructed. So far, bold lines have
been used to indicate the links between tasks. Dashed lines are now introduced to illustrate links that may
or may not occur (i.e. risks and opportunities). Risks and opportunities can be modelled in two ways: by
modelling the additional impact on the task's duration should the risk occur, as shown in Figure 19.7,
or by separately modelling the total durations of the task should the risk occur or not occur, as shown in

480

Risk Analysis

Project start date

29-Mar-10

Date St
29-Mar-10
Task2nomal 4-Jun-10
=TaskZrisk 4-Jun-10

26-Jul-10

Duration d (weeks)
Finish Probability
Start
Start lag (weeks)
Dist
Min
ML
Max Date Fin
logic
Dist Min ML Max
8
9
12
4-Jun-10
=Start date
9.67
7.33
6
7
9
26-Jul-10
70%
=Fin1
12.33
15 30-Aug-10 20%
10
12
=Fin1

9.00

7

8

12

26-Jul-10
27-SeP-10

Finish
logic

Finish lag (weeks)
Dist Min ML Max

=FinPtotal

Figure 19.8 Modelling schedule risk as alternative durations.

Figure 19.8. In the example of Figure 19.7, task 2 is expected to take (6,7,9) weeks, but there is a 20 %
chance of a problem occurring that would extend the task's duration by (4,6,9) weeks. In Figure 19.8,
task 2 is estimated to take (6,7,9) weeks, but there is a 20 % chance that a particular risk will increase
its duration to (10, 12, 15); there is also an opportunity, with about a 10 % chance of occurring, that
would reduce the duration to (5, 6, 8). The start of task 3 is equal to a discrete distribution of the finish
dates of task 2's possible scenarios.
The most common multiple relationship between tasks in a project schedule is where one task cannot
start until several others have finished, which is modelled using the MAX function. This completes our
look at all of the basic building blocks of a schedule risk model. Figures 19.9 and 19.10 illustrate all
of these building blocks being used together to model Example 19.4.
Example 19.4

A new building is to be constructed by a consortium for a client. The project can be divided into seven
sections, as described below. The client wishes to see a risk analysis of the schedule and cost risks and
how these relate to each other.
Design

The detailed design will take (14, 16,21) weeks, but the architect thinks there is about a 20 % chance
that the client will require some rework on the design that will mean an additional (3,4,6) weeks. The
architect team will charge a £160 000 flat fee but will require £12 000 per week for any rework.
Earthworks

The site will have to be levelled. These earthworks can begin immediately on award of the contract.
The earthworks will take (3,4,7) weeks at a cost of (£4200, £4500, £4700) a week. There is a risk that

Chapter 19 Project risk analysis

Contract award date

Ddail(a)
Rework dsk (b)

29-Mar-10

Date St

Dst

29-Mar-10
30-Jul-10

17.7
4.3

Duration d (weeks)
Min
ML
Max
12
3

16
4

25
6

Finish
Date Fin

Probability

Stan
logic

30Jui-10
30-Aug-10
3OJul-10

80%
20%

=Start date
FinDetail

Start lag (weeks)
Dist Min ML Max

Finish
logic

Finish lag (weeks)
Dist Min ML
ax

Discrete

30- AD^-10

Discrete

FOUNDATIONS (Tasks)

30- AD^-10

7.0

6

7

8

I8Jun-10

6-Aug-10
9-Sep-10
13-Oct-10
16-Nov-10

4.8
4.8
4.8
8.3

4
4
4
7

45
4.5
4.5
8

6
6
6
10

9-Sep-lo
13-Oct-10
16-Nov-10
13Jan-11

30-Sep-10
25-Nov-10
PO-Jan-11

8.0
8.0
8.0

7
7
7

8
8
8

9
9
9

25-Nov-10
2OJan-1 I
17-Mar-I I

Fin4a+iag

3

30-Sep-10 8.3
27-Nov-10 8.3
25-Jan-1I 8.3
Bmwn Bms
ReddaGreerm
(g) Combined

6
6
6

8
8
8

11
11
11

27-Nov-10
25Jan-1 I
24-Mar-I 1
7
1
2 4 1
17-Mar-11

Fin4a+lag

3

17-Mar-11
28-May-I1
9-Aug-11

10.3
10.3
10.3

8
8
8

10
10
10

13
13
13

28-May-11
9-Aug-11
20-Oct-11

28-May-11
9-Aug-I1
20-Oct-11

11.0
11.0
11.0

9
9
9

11
11
11

13
13
13

13-Aug-11
25-Ocl-11
5-Jan-12

19-Jan-12

55%

0.5

1

1.5

13-Feb-12

5%

STRUCTURE (Task4)

Floorl(a)
FloorZ(b)
Floor3 (c)

ReddhGreene
Floor1 (d)
FloorZ(e)
Floor3(f)

COMMISSIONING (Task7)
Tidy upsfle (a)
5Jan-12
WdCdlbaCk(C)

6-Feb-12

2.0
1.0

90%
10%
Discrete

Fin6c

Figure 19.9 Schedule model layout for Example 19.4.

Max

3

2

3

4 .

48 1

Project cost model
Formulae table
=Y9'D8+W8
=VoseDiscrete(W8:W9,AA8:AA9,Ubrown)
Detall £1 60,000
+Rework risk £217,884

£20,104

£4,203

£19,035
£38,405

£2,800 £3,000
£37,000 £38,500

£3,300 per week
£40,000

£53,046
£60,911

£4,703
£4,700

£5,500
£5,500

£4,500

£4,700

perweek

FOUNDATIONS (Task3)

W21:W23
W24,W33,W38,W39
W26
W32
W34
W35

=VoseTrlangie(X13,Yl3,213)'D12
=VoseTr1angie(X16,Y16,Z16)'D17
=VoseTriangle(X17,Y17,Z17)
=W17+W16
=VoseTriangle(XPl.Y21,Z21)'D40
=VoseTnangle(X24,Y24,224)'3
=SUM(W21 :W25)
=VoseD1screte(W30:W31
.AA30:AA31,Udeslgn)
=Y34
=W34+W33+W32

=W38+W39
=W43+W4O+W35+W26+W18+W13+WlO
£5,200
£5.200

Brown Bros £ 197,000

Figure 19.10 Cost model layout for Example 19.4.

the earthworks will reveal artefacts that will require an archaeological survey to be carried out before
any building work can proceed. Local knowledge suggests that there is about a 30 % chance of finding
any significant artefacts and that the surveying would then take (8, 10, 14) weeks.

The foundations can be started when the earthworks are complete, and will take (6,7,8) weeks.
The costs will be (£2800, £3000, £3300) per week for labour, plus (£37000, £38500, £40000) for
materials.
Structure

The building's structural components (floors, pillars, roofing, etc.) can be started (3,4,6) weeks after
the foundation work is completed, depending on the weather. There are three floors, all exactly the
same, and the building contractor believes each floor can be constructed in (4,4.5,6) weeks depending
mostly on the weather. Each floor will cost (£4700, £5200, £5500) per week for labour and (£17200,
£17 500, £18 000) for materials, depending on the exact final design. The roofing will take (7, 8, 10)
weeks for a fixed price of £172000.

Chapter 19 Project risk analysis

483

Envelope

The envelope (walls, windows, external doors) work can be started 3 weeks after the first floor is
completed. The materials will cost (£36 000, £37 000, £40 000) per floor, depending on the final architect
design. The ground floor will require high-security doors at a cost of £9800. The envelope work labour
will be provided by Brown Bros Ltd, at a quoted price of £197 000, who estimate that each floor will
take (7, 8,9) weeks to complete, depending on the weather. However, Brown Bros Ltd are being taken
over by another firm, and the Managing Director thinks there is about a 10 % chance that, under the new
owners, they will not be allowed to accept the work. The next alternative is Redd and Greene Ltd who
have quoted £209 000 for the labour. They estimate it would take them (6, 8, 11) weeks to complete
each floor, again depending on the weather.
Services and f~nishings

The services (plumbing, electricity, computer cabling, etc.) and finishings (internal partitioning, decorations, etc.) can be started when each floor is completed. Services will cost (£82 000, £86 000, £91 000)
for each floor, and finishings will cost (£92 000, £95 000, £107 000) for each floor.
Commissioning

Two weeks are needed after all work has been completed to tidy up the site and test all of the facilities,
at a cost of £4000. It is thought that there is a 40 % chance that the services contractor will be called
back to fix problems, resulting in a delay of (0.5, 2, 5) weeks, and a 5 % chance they could be called
back again, resulting in a further delay of (0.5, 1, 1.5) weeks.
Figure 19.9 illustrates a spreadsheet model of the project plan for this example, using the expected
value of each uncertain task duration. Figure 19.10 illustrates the cost model for this example. Note
that the two models actually reside within the same spreadsheet to allow linking between them. Also
note that a Uniform(0, 1) distribution at cell I1 is being used to control the generation of values from
distributions in cells H9 and W10 which will ensure they have a 100 % correlation. The same applies
with a Uniform(0,l) distribution at cell I2 which will generate a 100% correlation between cells H36
and W32. This is equivalent to using a 100 % rank order correlation between the variables.
The distributions of schedule and cost for this model are shown in Figures 19.11 and 19.12. Their
interrelationship is illustrated in the scatter plot of Figure 19.13 which shows that the cost of the project
is not strongly influenced by the amount of time it will take to complete. +

1 9.2.1 Critical path analysis
In traditional project planning, the duration of each task is given a single-point estimate and an analysis
is performed to determine the critical path, i.e. the tasks that are directly determining the duration of
the project. In a project schedule risk analysis, the critical path will not usually run through the same
line of tasks in every iteration of the model. It is therefore necessary to introduce a new concept: the
critical index. The critical index is calculated for each task and gives the percentage of the iterations
for which that task lies on the critical path.
The critical index is determined by assigning a function to each task in the risk analysis model that
generates a "1" if the task is on the critical path and a " 0 if it is not. This function is nominated as
an output and the mean of its result is then the critical index. It is often not necessary to calculate the
critical index for every task. The structure of the schedule will usually mean that, if one task is on the

484

Risk Analysis

10.9

2.
m

g

-

0.8 0.7 0.6 -

;0.4 6 0.3 0.2 -

2

0.5
.*

-m

0.1 023-Nov-11

I

12-Jan-12

2-Mar-12

21-Apr-12
Finish date

I

I

10-Jun-12

30-Jul-12

Figure 19.11 Cumulative distribution of finish date for Example 19.4.

10.9 -

3

0.8 -

0.7 m
n
2 0.6 a
.-3 0.5 *

m
3

5
0

0.4 0.3 0.2 -

f 1,460,000

E l ,500,000

f 1,540,000
Total project cost

Figure 19.12 Cumulative distribution of project cost for Example 19.4.

L

f 1,580,000

f 1,620,000

Chapter 19 Project risk analysis

485

Finish date

Figure 19.13 A scatter plot to illustrate the relationship between project duration and cost for Example 19.4.

critical path, several others will be too, or that the critical index in one branch is 1 minus the critical
index of another.
Example 19.5

Figure 19.14 illustrates the sort of logic that one can usually apply quickly to determine critical indices.
In this example, tasks A and J are always on the critical path, therefore CI(A) = CI(J) = 1:
If CI(B) = p, CI(G) = CI(H) = CI(1) = 1 - p and CI(F) = p
If CI(E) = q ,
CI(C) = CI(D) = p - q

Figure 19.14 Task linking to determine critical index logic for Example 19.5.

I

486

R~skAnalys~s

i.e. for this example it is only necessary to determine p and q , and all of the other critical indices can
be deduced.
Index p can be determined by writing the following function:

f (p) = IF (star-of J = finish-ofP, 1,O)
i.e. if the start of task J equals the end of task F, return a "I", otherwise return a " 0 .
Index q can be determined by the following function:

f (q) = f (p) * IF (start-of3 = finish-of_E,l, 0)
i.e. if the start of task F equals the end of task E and the start of task J equals the end of task F, return
a "I", otherwise return a " 0 . +
If the critical index is close to zero, reduction in the duration of the task is very unlikely to affect the
project's duration. On the other hand, progressively larger values of the critical index indicate increasing
influence over a project's duration. The index can therefore help the project manager target those tasks
for which she should attempt to reduce the duration, or reschedule within the project plan in a way that
removes them from the critical path.
If the individual task durations are nominated as outputs, along with the critical index functions, the
analyst can look at the raw generated data and analyse in which situations each task is on the critical
path. A conditional median analysis (Chapter 15) can be carried out, comparing the data subsets for the
duration of each task from the iterations when the task is on the critical path against the entire dataset of
generated durations for that task. The higher the value of conditional median analysis index a! associated
with a task, the greater the influence that task is having on extending the project's duration.

19.3 Portfolios of risks
A large project may have many risks listed in its risk register - sometimes a thousand or more! For
most practical purposes that is an awful lot to put into a simulation model where each one is of the

= VoseBinomial(1, p)

* VosePERT(Min, Mode, Max)

= VoseRiskEvent(p, VosePERTObject(Min, Mode, Max))

where p is the probability of occurrence and the PERT distribution reflects the size of the impact
should it occur. The VoseRiskEvent function in ModelRisk allows you to simulate a risk event as one
variable, rather than the usual two (Binomial(1, p) or Bernoulli(p) and impact distribution) which lets
you perform a correct sensitivity analysis.
There is a faster way to model the aggregate risk if each risk represents a potential impact on the cost
of the project, and there are no cascading effects, meaning that one risk's occurrence does not change
the probability or impact size of any other risk.

I

Chapter 19 Project risk analysis

487

Let p~ and Q be the mean and variance of the impact distribution should the risk occur. Then the
mean and variance of the risk as a whole are given by
PR = PPI

If the risks are independent, then we can estimate the total impact by simply summing the means and
variances, i.e.

Appendix I11 gives the formulae for the mean and variance for many distributions - certainly all the
ones I would expect you would use to model a risk impact. You can also get the moments directly in
ModelRisk by using special functions, e.g.
= VoseMean(VosePERTObject(2,3,6))returns the mean of a PERT(2, 3,6) distribution
= VoseVariance(VoseTriangleObject(D8, E8, F8)) returns the variance of a

1

i

E

Triangle(D8, E8, F8) distribution
These functions make life a lot easier if you are using several different distribution types for modelling
the impact. If you have a large number of risks - some risk registers have thousands of risks - it is often
=
/,)
distribution.
a reasonable approximation to model the aggregate impact as a Norma1(pTOTAL,
ModelRisk has several other functions relating to a portfolio of risks that will simulate the aggregate
distribution in one cell (speeding up the simulation a great deal by running it all in C++), identify
the risks of greatest impact to the total mean or variance and fit certain distributions like the normal,
or skewed and shifted distributions based on matching the mean and variance of the aggregate risk
portfolio - the last feature will speed up a simulation enormously if used as a surrogate.

19.4 Cascading Risks
We have just had a postal strike in Gent where I live that lasted, I think, 12days. The post office lost
a fair bit of money on that strike, I imagine, and perhaps the employees too. Belgium is a country of
contrasts - we don't have cheques any more, so everyone pays electronically, yet invoices and a vast
array of perplexing official administration always come in the mail. So for 12days my office didn't
receive any bills, and for the next 2 weeks the post office was dealing with the backlog, so we got bank
statements too late to close the books properly, etc. 1 wonder what the reaction will be. We bought a
rather high-end computer in that period from a small shop that can build fancy custom PCs and they
needed paying quickly, so unusually they emailed the invoice and it was paid 10 minutes later. My

488

Risk Analysis

son's birthday presents arrived late, printed invitations for an exhibition that was being held in my house
only arrived the morning before the exhibition, etc., etc. Multiply that irritation by a couple of hundred
thousand households and I think the post office will find a considerable impact on their future because
people will look at ways of avoiding using the post office. The impact goes far beyond managing 12 days
of backlog. How about the risk of being forced to privatise - maybe that is now higher. Or the risk
that a competitor will enter the market, or that Belgacom or Telenet, who offer Internet services here,
will capitalise on the strike with some highly amusing advert and get people to think about moving to
electronic mail more.
Ideally, we would like to be able to capture the complete potential impact of a risk to be able
to understand the level of attention it should receive. Maybe some risk occurring in a project has a
relatively minor impact directly, but it increases the chances of another, much larger risk. We can think
of lots of reasons why: management are focused on handling the effects of small risk A so nobody is
paying attention to a looming big risk B; little risk A occurs, blame is passed liberally around, people
stop communicating and helping each other and those who can see big risk B coming along think "It's
not my problem". The biggest risks in projects are, after all, driven by people issues. I think that the
occurrence of a lot of big risks comes at the end of a chain of small risks occurring that were perhaps
much easier and less costly to deal with.
Let's look, therefore, at a couple of simple ways of modelling a cascading set of risks in a project.
We want to model the probability of a risk occurring, and perhaps the size of its impact too, as being
to some degree influenced by whether other risks have occurred. Figure 19.15 gives an example.

Formulae table

=VosePERT(C3,D3,E3)
=IF(OR(G3,G5),50%,15%)
=IF(OR(G3,G6),45%,22%)
=IF(AND(G4,G5,G7,G8),13%,5%)
G3:G9, G I 1:GI2 =VoseRiskEvent(F3,VosePERTObject(C3,D3,E3))
= I F ( A N D ( G ~ , G ~ , G ~ , G ~ ) , ~ . ~ , ~ ) * V O,VosePERTObject(Cl
O S ~ R ~ S ~ E V ~ ~ ~O,DlO,ElO))
(F~
=SUM(G3:G12)

Figure 19.15 Model of correlated (cascading) risks.

Chapter 19 Project risk analysis

489

The model uses the ModelRisk function VoseRiskEvent because it has certain advantages, as we'll
see in a minute, but otherwise just replace formulae of the type

with

The interesting parts of this model reside in shaded cells F6, F7, F10 and G10, where I have used
IF statements to change either the probability or impact of a risk depending on whether other risks or
a combination of risks have occurred: if Risk A occurs, it increases the probability of risks D and E
occurring; and if risks B, C, E and F all occur, then both the probability of occurrence and the size of
impact of risk H increase.
Let's now look at the influence each risk has on the output sum in cell G13. Most Monte Carlo
add-ins have the ability to plot tornado charts (see Section 5.3.7) automatically by saving the values
generated from their distributions during a simulation. Depending on your host Monte Carlo add-in, you
can force it to save values from a cell as if they were generated by that software's own distribution.
We want to see the effect of risk A, so we make cell G3 the input to, say, a Normal(C3, 0) or a
DUniform(C3). Applying this method and running a simulation, I can generate plots like the ones in
Figure 19.16.
The left-hand pane of Figure 19.16 shows that, with correlation added to the model, risk D is most
rank order correlated with the total, which sort of makes sense because it has about a 20 % chance of
occurring (the variance introduced by a risk has a component that is proportional to p * (1 - p), where
p is the probability of occurrence, so, the closer p is to 0.5, the more uncertainty the risk adds) and
a wide impact distribution. The right-hand pane shows that, with no correlation in the model, risk D
drops to second place. That may seem strange because risk D wasn't influencing any other risks, so
why should it lose first place? The answer lies in the fact that in the left pane risk D is being influenced
Correlatlons for Total rlsk Impact with no correlations

Correlatlons for Total rlsk Impact wlth correlation

-1

D

0596

,

A

J

0 453

I

D

J

B

E

A

I

B
C
I

I

H

1

I
C

E
H

I

G
F

0

01

02

03

04

CorrelatlonCoefflclents

05

06

07

I

G
F

0

01

02

03

04

05

06

07

CorrelatlonCoefflclents

Figure 19.16 Sensitivity analysis results for the model of Figure 19.15. Left pane - with correlations; right
pane - without.

490

!

I,
I
I

I

Rlsk Analys~s

by risk A, which also influences risk E. These three risks will therefore all be correlated to some degree
and thus their rank correlation with the output goes up as it gets "credited" for the output correlation
with risks A and E. It is important not to read these types of graphs in a vacuum - one should also
consider any causal direction. Comparing the two plots, you'll see that risks A and E have moved up
two and three places respectively, which aligns with the correlation we've built in.
Finally, since tornado charts remain a little confusing, let's redo the sensitivity analysis using spider
plots (Section 5.3.8). To achieve this, we have to be able to control the sampling of the RiskEvent
distributions, which we do via its optional U-parameter. For example

generates random values from the distribution of possible values of the risk, while adding a U-parameter
of 0.9

yields the 90th percentile of the distribution. The spider plot for this model (Figure 19.17) can be plotted
by starting at each risk event's 50th percentile, since none of the risks has greater than a 50 % chance
of occurring, so their 50th percentiles and below are zero.

Effect on Mean of Total risk impact by percentile of input distribution

-

w - u

15 1
50%

.@

- 0--

Q

- - -

"

-

-0

-

"8

-

I

-@

- 0 -- 8 - -6

I

I

I

I

I

I

I

I

I

1

55%

60%

65%

70%

75%
Percentile

80%

85%

90%

95%

100%

Figure 19.17 Spider plot sensitivity analysis on mean output for the model of Figure 19.15.

Chapter 19 Project risk analysis

49 1

Spider plots are easy to read when the variables are all continuous and not so easy to read when they
are a discrete-continuous (combined) distribution, but with a little practice they can reveal a lot. Let's
take the line for risk A, which starts at the lowest y value on the left and jumps after the 70 % mark:
that's because the risk has a 30 % (1 - 70 %) chance of occurring. The minimum value of the impact of
risk A is 7, yet, when the line kicks, it jumps up by about 14.6 units. That's because risk D (with a mean
of 14.7) increases from 15 to 50 % and risk E (with a mean of 8.4) increases from 22 to 45 %. That
gives an expected increase of (50 % - 15 %) * 14.7 (45 % - 22 %) * 8.4 = 7.1. Here, 7 7.1 gives
us almost the jump we see - the extra bit is because the first plotted point above x = 70 %is = 72.5 %,
which is the 2.5 %I30 % = 8.3 percentile of the PERT, not its minimum.
You can probably make out that risks D and E have two kicks: D somewhere in [50 %, 52.5 %I, and
again in [85 %, 87.5 %I. That's because D has either a 15 % or a 50 % chance of occurring, depending
on the occurrence or risks A and C. The same thinking applies to risk E. The vertical range that each
line takes tells us the variation that the output mean can take depending on each risk event's value. The
range is largest for risks A and C, but risk C jumps at a much higher x-axis percentile than risk A, so
has less influence in terms of correlation, although more in terms of extreme values for the output mean.

+

+

Chapter 20

Insurance and finance risk
analysis modelling
In this chapter I introduce some techniques that have been developed in insurance and finance risk
modelling. Even if insurance and finance are not your fields, you might still find some interesting ideas
in here. Insurance and finance analysts have placed a lot of emphasis on finding numerical solutions to
stochastic problems - something that is highly desirable because it gives more immediate and accurate
answers than Monte Carlo simulation. In this chapter I dash about showing some modelling techniques
to give you a flavour of what can be done. For those of you involved in insurance and finance, I hope it
will also have piqued your curiosity about the ModelRisk software, which is described in Appendix 11.
Sections 20.1 to 20.5 explain some techniques from the finance field, and Section 20.6 onwards looks
at some insurance ideas. You will notice that they frequently share some common principles.

20.1 Operational Risk Modelling
Operational risk is defined in the Base1 11 capital accord as "The risk of loss resulting from inadequate
or failed internal processes, people and systems or from external events". It includes: internal and
external fraud; workers' discrimination and health and safety; antitrust, trading and accounting violations;
natural disasters and terrorism; computer systems failure; data errors; failed mandatory reporting (e.g.
sending out statements or policy documents within a required time); and negligent loss of clients'
assets. However, it excludes strategic risk and reputation risk, although the latter can be affected by the
occurrence of a high-visibility operational risk. Base1 I1 and various corporate scandals have brought
operational risk into particular focus in the banking sector, where operational risks are required to be
closely and transparently monitored and reported. Sufficient capital must be held in reserve to cover
operational risk at a high level of certainty to achieve the highest rating under Base1 11. Under Base1
11's "Advanced Measurement Approach", which will usually be the least onerous on a bank provided
they have the necessary reporting systems in place, operational risk can be modelled as an aggregate
portfolio problem similar to insurance risk. The model of Figure 20.1 uses an FFT method to calculate
the capital required to cover a bank's risks at the 99.9th percentile level. Base1 I1 allows a bank to
use Monte Carlo simulation to determine the 99.9th percentile, but the use of FFT methods is to be
preferred over simulation because such a high loss distribution percentile requires a very large number
of iterations to determine its value with any precision. The difference between the 99.9th percentile and
the expected loss is called the "unexpected loss" and equates to the capital charge that the bank must
set aside to cover operational risk.
The model has assumed that each risk is independent (making a Poisson distribution appropriate to
model the frequency, although a P6lya or Delaporte may well be better - see Section 8.3.7) and that

494

Risk Analysis

A I B I C I D I E I F I G I

H

I

I J I

I

K

I

L

I

I

M

lo

N

1

Fonnulw table
H3:H47

=VosePoissanObjed(E3)

M5 (ou~PuI) =VoseAqareqa1eMuI11FFTl$H$3'$H$47,$183:$l$47,M3)-M4

-..

Figure 20.1
Basel II.

.

-

~~

--

Model to determine a financial institution's capital allocation to cover operational risk under

the impacts all follow a lognormal distribution. In this model one could have used fitted distribution
objects (e.g. VoseLognormalFitObject(data)) that were linked to the available data. The chief difficulty
in performing an operational risk calculation is the acquisition of relevant data that could be used to
determine the parameters of the distributions. Operational risks, especially those with a large impact,
occur very infrequently, so there is often an absence of any data at all within an individual bank.
However, one can base the frequency and severity distributions on general banking industry databases,
and use credibility theory (for example, using the Biihlmann credibility factor (Klugman et al., 1998))
gradually to assign more weight over time to the individual bank's experience against the industry as a
whole. Credibility theory is often used in the insurance industry when one offers a new policy hoping
to attract a particular sector of the population with a known risk level: as a history of claims emerges,
one migrates from the expected claim frequency and severity to that actually observed.

20.2 Credit Risk
Credit risk is the risk of loss due to a debtor's failure or partial failure to repay a loan or other credit
instrument (bonds). We need three components to assess the credit risk of an individual obligor:
1. Default probability.
2. Exposure distribution.
3. Loss given default as a fraction of the exposure.
Components 2 and 3 can sometimes be replaced with a single distribution of loss given default.
Chapter 10 describes how to fit probability distributions to data which are used to estimate 2 and 3
above. Section 9.1.4 describes how to estimate the binomial probability needed for the probability of
default. There are a number of methods we can use to determine the aggregate distribution, the bases
of which are described in Chapter 11.

Chapter 20 Insurance and finance risk analysis modelling

495

20.2.1 Single portfolio example
Figure 20.2 shows a credit risk model for a single portfolio of independent, lognormally distributed
random individual loss distributions where there are 2 135 debtors each with the same 8.3 % probability
of default.
In this model the VoseAggregateMC function in cell C11 is randomly sampling n Lognormal(55, 12)
distributions and summing them together, where n is itself a Binomial(2135, 8.3 %) distribution. This is
the "brute-force" Monte Carlo approach to summing random variables, but with the advantage that the
simulation is all done within C++ (which is faster) and the final random sum is returned back to Excel.
Figure 20.3 shows the equivalent model performed in Excel. The model takes up a lot more space (note
the number of hidden rows) and runs many times slower, but, more importantly, it needs resizing if we
change the number of obligors or the probability of default.
Since the model of Figure 20.2 is estimating a distribution of loss, the value at risk (VaR) (see
Section 20.5.1) at the 95th confidence level is simply the 95th percentile of the output distribution, which
is returned directly into the spreadsheet at cell E l l at the end of a simulation. In this example, running
1 000 000 iterations produces a VaR of 10943.62. We can take a different approach by constructing the
aggregate distribution using fast Fourier transforms (here using VoseAggregateFFT). Cell C13 generates
random values from this distribution, while cell E l 3 employs the U-parameter (see Appendix 11) to
return the 95th percentile of the aggregate distribution directly without any simulation. The ModelRisk
screen capture of Figure 20.4 shows this second use of the function.

A

1

2
3

4
5
6

7
8

1

B

I

C

l

D

I

E

Lossmean
55
Loss standard deviation
12
Loss distribution object VoseLognormal($C$2,$C$3)
2,135
Obligors
Probability default
8.30%
Defaults object
VoseBinomial($C$S,$C$6)
95%
Confidence level

9
10

11
12

2

I

F

I

G

1

H

Method 1:Monte Carlo simulation
(~osses
9,381
Method2: Fast Fourier Transform
Losses
9,183

1

VaR:

9.380.66

1

VaR:

10,943.55

43
ES:

Value of E l 1 after 1
million iterations =
10943.62
562.91

1

2

2
16
1
7
18
19
20
21
22
23

II

-

C4
C7
C11
E l 1 (with @RISK)
E l 1 (with Crystal Ball)
C13
El3
GI3

Formulae table
=VoseLognormalObject(C2,C3)
=VoseBinomialObject(C5,C6)
=VoseAggregateMC(VoseBinomial(C5,C6),C4)
=RiskPercentile(Cl 1,D8)
=CB.GetForePercentiIeFN(Cll,C8)
=VoseAggregateFFT(C7,C4)
=VoseAggregateFFT(C7,C4,,C8)
=Voselntegrate("#*VoseAggregateFFTProb(#,C7,C4,,O)",El3,15000,1)

24

Figure 20.2 Credit risk model for a single portfolio.

496

Risk Analysis

A

1

$

5
6
7

$1

1

B

C

Probabilit default
Number of defaults
Total loss out ut

8.30%

ID]

E

I

F

Formulae table
=VoseBinomial(C4,C5)
C6
C10:C249 =IF(Bl O>$C$6,O,VoseLognormal($C$2,$C$3))
=SUM(ClO:C249)
C7

9375.81
Individual loss
1 67.15273135

248
240
250

Figure 20.3

Monte Carlo simulation performed in Excel.

Figure 20.4

Screen capture of the VoseAggregateFFT function.

IG

Chapter 20 Insurance and finance risk analysis modelling

497

In cell G13 the formula:

=VoseIntegrate("#*VoseAggregateFFTProb(#, C7, C4, , 0 ) ",E13, 15 000, 1)
calculates the integral to determine the expected shortfall (see Section 20.5.2):
15000

where E l 3 is the defined threshold, x is the value of the aggregate loss distribution, f (x) is the density
function for the aggregate loss distribution (determined by the function VoseAggregateFFTProb(. . .))
and 15 000 is a sufficiently large value to be used as a maximum for the integral, as shown in the screen
capture of Figure 20.5.

Figure 20.5 Integration to determine the expected shortfall of a credit portfolio. The "Steps" parameter is
an optional integer used to determine how many subintervals are made within each interval approximation
as the function iterates to optimised precision.

20.2.2 Single portfolio example with separate exposure and loss
given default distributions
At the beginning of this section I pointed out that one may generally have separate distributions for the
amount of exposure a debt holder has, and the fraction of that exposure that is realised as a loss. This

means that we need to determine the sum
# defaults

Exposure, * LossFractioni
i=l
This is easily done with Monte Carlo simulation in a manner similar to the model of Figure 20.3 but
replacing the lognormal variable with the product of two variables. Alternatively, with ModelRisk you
can use the VoseSumProduct function to return the aggregate distribution with one cell formula. Both
methods are shown in Figure 20.6.
The VoseSimulate function simply generates random values from its object distribution parameter,
which allows the user to keep the distribution in one place in the spreadsheet rather than many. The
VoseSumProduct(n, a , b, c . . .) adds n variables together where each variable is a * b * c * . . . , where
a , b, c, etc., are distribution objects, so an independent sample is taken from each variable, a , b, c . . .
for each of the n variables.
We can also construct the density function fL(x) for the individual loss distribution as follows:
1

fL(x) =

1

(;)

f ~ ( ~ 1 . f ~dy

0

A

1

B

D

C

1
VoseBeta(13,43)
2.30%
7
8
9

2
11

Method 1: SumProduct
ITotal loss (output)
Method 2: Pure simulation
ITotal loss (output)

3098.42

1

3082.48

2
Number of defaults

15

Individual loss
20.10

192

193
194
195
196
197
198
199
200
201
202
203

C2
C3
C6
C9
C14:C193
C14:C193 (alternative)
C11

Formulae table
=VoseLognormalObject(100,10)
=VoseBetaObject(l3,43)
=VoseBinomial(C4,C5)
=VoseSumProduct(C6,C2,C3)
=IF(B12~$C$6,O,VoseLognormal(100,1
O)*VoseVoseBeta(l3,43))
=IF(B12>$C$6,O,VoseSimulate($C$2)*VoseSimulate($C$3))
=SUM(C14:C193)

IE

Chapter 20 Insurance and finance risk analys~smodelling

499

where f F ( ) is the density function for the loss fraction distribution and fE() is the density function for
the exposure distribution. Then the loss given default distribution can be constructed and used in an
FFT calculation. The ModelRisk function VoseAggregateProduct performs this routine, so we can write

20.2.3 Default probability as a variable
Default probabilities can perhaps be considered constant over a short period, but over a longer period
or where the market is very volatile they should also be modelled as functions of the condition of the
economy and, for corporate credit risk, as functions of the state of the regional business sector too. The
same may apply for the loss given default variable, as the debt holder may recover a smaller fraction
of the exposure in more stressing times. This means that we must construct credit portfolios that are
disaggregated at an appropriate level and sum their cashflows. Failure to model these correlations will
underestimate the risk of losses. For example, we might produce a model that varies the probability of
default (PD) as a function of changes in GDP growth (GDP), interest rate (IR) and inflation (I) using
an equation of the form
PD, = PDo * exp[Nonnal(l

+ a * (GDP, - GDPo) + b * (IR, - IRo) - c * (I, - Io), u ) ]

where a , b and c are constants, t is some time in the future, a is a residual volatility and Xo means the
value of the variable X now.

20.3 Credit Ratings and Markov Chain Models
Markov chains are often used in finance to model the variation in corporations' credit ratings over time.
Rating agencies like Standard & Poor's and Moody's publish transition probability matrices that are
based on how frequently a company that started with, say, an AA rating at some point in time has
dropped to a BBB rating after a year. Provided we have faith in their applicability to the future, we can
use these tables to forecast what the credit rating of a company, or a portfolio of companies, might look
like at some future time using matrix algebra.
Let's imagine that there are just three ratings, A, B and default, with a probability transition matrix
for one year as shown in Table 20.1.
We interpret this table as saying that a random A-rated company has an 81 % probability of remaining
A-rated, an 18 % probability of dropping to a B rating, and a 1 % chance of defaulting on their loans.
Each row must sum to 100 %. Note the matrix assigns a 100 % probability of remaining in default once
Table 20.1 Transition matrix example.

z

Default

17%
0%

77%
0%

6%
100%

500

R~skAnalys~s

Table 20.2 Transition scenarios and their
robabilities for a 2 year period for a company with

one is there (called an absorption state). In reality, companies sometimes come out of default, but I'm
keeping my example simple to focus on a few features of Markov chains.
Now let's imagine that a company starts with rating B and we want to determine the probability it
has of being in each of the three states in 2 years. The possible transitions are shown in Table 20.2.
Thus, the probability that the currently B-rated company will have each rating after 2 years is

+ 0.1309 + 0.0 = 0.2686
P(B rating) = 0.306 + 0.5929 + 0.0 = 0.6235
P(defau1t) = 0.0017 + 0.0462 + 0.06 = 0.1079

P(A rating) = 0.1377

The calculations get rather more tiresome when there are many possible states (not the three we
have here), but fortunately we are simply performing a matrix multiplication. Excel's MMULT array
function can do this for us quickly, as shown in Figure 20.7: cells D11:Fll give the above calculated probabilities and the table D10:F12 gives the probabilities for a company starting in any other
state.
Now let's imagine that we have a portfolio of 27 companies with an A rating and 39 companies with a
B rating, and we would like to forecast what the portfolio might look like in 2 years. Figure 20.8 models
the portfolio in three ways: using just binomial distributions if that is all your simulation software has,
using the multinomial distribution and using ModelRisk's VoseMarkovSample function.
Methods 1 and 2 in this model have two limitations. The first is that the model can become very
large if we want to forecast out many periods because we would have to repeat the MMULT calculation
many times. The second, more important, limitation is that the model can only handle integer time
steps. So, for example, our transition matrix may be for a year, but we might want to forecast out
just 10 days (t = 101365 = 0.027397). The VoseMarkovSample function removes these obstacles: if the
transition matrix is positive definite, we can replace the value 2 in the function at cells F19:F21 with
any non-negative value, as illustrated with the formula in cells L19:L21.
I mentioned that in this model "Default" is assumed to be an absorption state. This means that, if a
path exists from any other state (A rating, B rating) to the default state, then eventually all individuals
will end up in default. The model of Figure 20.9 shows the transition matrix for t = 1, 10, 50 and
200. You can see that for t = 50 years there is about an 81 % probability that an A-rated company will

I

1

Finish

One year
transition matrix

Default

81 %
1 7%

77%
0%

100%

Default

68.67%

28.44%

1

Fiaure 20.7 Calculation of a 2 vear transition matrix.

1. Using only binomial distributions

2. Using multinomial distributions

Finishingstates

ti

Finishing states

Default
4

'

Default
Total

41

24

3. Using Markov Chain function in ModelRisk

27
2-8-

29

3
31
32
33

Formuale table
=VoseBinomial(B4,L4)
=VoseBinomial(B4-F12,M4/(M4+N4))
=B4-F12-GI2
L12:N12
(=VoseMultinomial(B4,F4:H4))
L13N13
(=VoseMultinomial(B5,F5:H5))
L14:N14
{=VoseMultinomial(B6,F6:H6))
F15:H15, L15:NlS (outputs)
=SUM(F12:F14)
F19:F21 (outputs)
(=VoseMarkovSample(B4:B6,F4:H6,2))
L19:L21 (outputs, non-integer t) {=VoseMarkovSample(B4:B6,F4:H6,K18))

Figure 20.8 Modelling the joint distribution of credit ratings among a number of companies in a portfolio

102

Risk Analysis

One year
transition matrix

a

Default

Transition matrix t

17% 77%
0% 0%

40% 36%
34% 32%

6%
100%

Transition matrix
t = 50

Transition matrix
t = 200

10%
9%

9%
8%

81%

0%
0%

24%
34%

B

Default

0%
0%

100%
100%

Formulae table
(=VoseMarkovMatrix(D4:F6,t))

I

Figure 20.9 Transition matrices for large values of t, showing the progressive dominance of the default
absorption state.

Figure 20.10

Markov chain time series showing the progressive dominance of the default state.

Chapter 20 Insurance and finance risk analysis modelling

SO3

already have defaulted, and about an 84 % chance for a B-rated company. By 200 years there is almost
a 100 % chance that any company will have already defaulted. Figure 20.10 shows this effect as a time
series model. If this Markov chain model is a reasonable reflection of reality, one might wonder how
it is that we have so many companies left. A crude but helpfully economic theory of business rating
dynamics assumes that, if a company loses its rating position within a business sector, a competitor will
take its place (either a new company or an existing company changing its rating), so we have a stable
population distribution of rated companies.

20.4 Other Areas of Financial Risk
Market risk concerns equity, interest rate, currency and commodity risks. The returns or values of a
portfolio of assets are subject to individual-level uncertainty but are also subject to various correlations
at two levels: specific risks apply to a small number of assets that are subject to a common acute
driver; and systematic risks apply more generally to a market sector (e.g. the price of natural gas affects
methanol producers) or the market as a whole (e.g. exchange rate). Correctly recognising the effects of
market risk allows an investor to manage its portfolio of assets by mixing negatively correlated assets
to offset specific risks (e.g. investing in both a methanol and an oil company) and systematic risks
(investing in assets in different countries). The quality of market risk analysis is highly dependent on
accurate modelling of correlation. Copulas (Section 13.3) are particularly helpful in this regard because
they offer considerable flexibility in representing and fitting the structure of any correlation.
Liquidity risk concerns a party who is an owner, or is considering becoming an owner, of an asset,
and is unable to trade the asset at its proper value (or at all) because nobody in the market is interested.
Liquidity risk arises when the party needs to raise cash at short notice, so it is intimately linked to the
cash position of the party. The problem mostly concerns emerging or low-volume markets when the
assets are stocks, etc. When the asset is an entire company and there are few potential buyers, the buyer
may take advantage of the urgency to sell.

20.5 Measures of Risk
20.5.1 Value at risk
Value at risk (VaR) is defined as the amount that, over a predefined amount of time, losses won't exceed
at a specified confidence level. As such, Val3 represents a worst-case scenario at a particular confidence
level. It does not include the time value of money, as there is no discount rate applied to each period's
cashflows. VaR is very often used at banks and insurance companies to give some feel for the riskiness
of an investment strategy. In that case, the time horizon should be the time required to be able to
liquidate the investments in an orderly fashion.
VaR is very easy to calculate using Monte Carlo simulation on a cashflow model. You set up a cell
to sum the cashflows over the period of interest, say in cell Al. Then, in another cell, to get the VaR
at the 95 % confidence level, you write

= - CB.GetForePercentileFN(A1,5%)

in Crystal Ball

7

504

Risk Analysis

which will return the VaR directly into the spreadsheet cell at the end of the simulation. Otherwise, just
run a simulation with A1 as an output, read off the 5th percentile and put a minus sign in front of it. I
gave an example of a VaR calculation in Figure 20.2.
The main problem with VaR is that it is not subadditive (Embrechts, 2000), meaning that it is
possible to design two portfolios, X and Y, in such a way that VaR(X Y) > VaR(X) VaR(Y). That
is counterintuitive because we would normally consider that, by investing in independent portfolios, we
have reduced risk.
For example, consider a defaultable corporate bond. A bond is a contract of debt: the corporation owes
to the bond holder an amount called the face value of the bond and has to pay at a certain time - unless
they go bust, in which case they default and the bond holder gets nothing. Let's say that the probability
of default is 1 %, the face value is $100 and the current price of the bond is $98.5. If you buy this bond,
the total cashflow at exercise can be modelled as

+

+

i.e. a 1 % chance of -$98.5 and a 99 % chance of $1.50: a mean of $0.50.
You could buy 50 such bonds for the same company, which means that you will either get 50 * $100
or nothing, i.e. a cashflow of

with a mean of $25. Alternatively, perhaps you could buy 50 bonds with the same face value and
default probability but each with different companies that have no connection between them so the
default events are completely independent, in which case your cashflow is

but has the same mean of $25. Obviously the latter is less risky since one would expect most bonds to
be honoured. Figure 20.11 plots the two cumulative distributions of revenue together.

1

-

1
-5000

0.9 0.8

-

- 0.7 .&

-

0
.-

S

d

0.6 -

-

0.5 0.4

-

3

0.3

-

0.2

-

0.1

-

ea

.-

z

4000

-3000

-2000

-1000

1030

0.1 -

m

0.01 - -

.- - - - - .- - - - .- .- - -

n

en

5
z
3

.-

50 companies

0 i ' - - - 7 -5000
4000

O.OO'

a

-

-

-

-

-

-

-

-

-

7

-

-

-2000

50 companies

0.0001 -

0.00001

--3000

-

-

-

-1000

0

Revenue

1000

0.000001 Revenue

Figure 20.11 Cumulative distributions for the two bond portfolios. The probability scale is in logs in the

right pane to show details at low probability.

Chapter 20 Insurance and finance risk analysis modelling

5000

505

d
I
I
I

4000

I
I
I

3000

I

50 companies

cc

>

I
I

2000

I
I
I

1000

I
I

0 -1 000

I

0

------------

-

0.2

I

I

0.4
0.6
Required confidence

I

I

0.8

1

Figure20.12 Relationship between required confidence and VaR for the two bond portfolios in the
example.

The relationship between required confidence and VaR for these two portfolios is shown in Figure
20.12.
In this example the VaR is larger for the diversified portfolio (the 50 bonds) until the required
confidence is greater than (1 - default probability). For models of more traditional investments, where
revenues can be considered approximately normal or t-distributed (elliptical portfolios), this was not
a problem even with correlations. However, financial instruments like derivatives and bonds are now
traded very heavily, and these are either discrete variables or a discrete-continuous mix, and nonsubadditivity then applies. For example, if analysts use optimising software (as is often done) to find a
portfolio that minimises the ratio of VaR to mean return, they can inadvertently end up with a highly
risky portfolio. The expected shortfall method of valuing risk discussed next avoids this problem.

20.5.2 Expected shortfall
Expected shortfall (ES) is superior to VaR because it satisfies all the requirements for a coherent risk
measure (Artzner et al., 1997), including subadditivity. It is simply negative the mean of the loss returns
in the distribution with a greater loss than specified by the confidence interval. For example, the 99 %
ES is negative the mean of the returns under the remaining 1 % of the distribution. The 100 % ES
is just negative the mean of the revenue distribution. Figure 20.13 shows the ES for the two bond
portfolios.
You can see that the 50-company-bond portfolio has a lower ES until the required confidence exceeds
(1 - default probability), in which case they match. The chance that none of the 50 different bonds
defaults is 99 %"50 = 60.5 %, so once we pass the 39.5 % confidence level the ES returns negative the
portfolio expected value of 25. For the portfolio of the same bonds with one company we are only 99 %
confident of no default and thus receiving payment to give us the portfolio expected value.

506

Risk Analysis

60

-

-

5 0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - I- - I

40 30 -

I

---On8c0mp8ny(
50 companies

I
I
I
I

20 -

I

(I)

W

I

10 -

I
I

0

0
-10 -

I

I

I

I

I

0.2

C .4

0.6

0.8

I
I

I
I

-20 -

I

L

-30

Required confidence

Figure 20.13

Relationship between required confidence and ES for the two bond portfolios in the example.

20.6 Term Life Insurance
In term life insurance, cover is purchased that will pay a certain sum if the insured person dies within
the term of the contract (usually a year, but it could be 5 , 10, 15, etc., years). Since the insured are
generally healthy, the insurer has a very low probability (maybe as low as 1 %) of paying out, which
means that the insured can obtain cover for a low premium. The major downside for the insured party
is that there is no obligation to renew cover beyond the original term, so a person acquiring a terminal
illness within the term of the cover won't be able to obtain a renewal.
The insurance contract generally provides for a fixed benefit B in the event of death. Thus the payout
P can be modelled as

where q is the probability that the individual will die within the contract term. Actuaries determine the
value of q from mortality tables (aka life tables). These tables determine the probability of death in the
next year given that one is currently a certain age combined with other factors like whether one smokes,
or has smoked, whether one is a diabetic, etc.
An insurer offering term life insurance will have a portfolio of people with different q values and
different benefit levels B. One could calculate the distribution of total payout for n insured persons by
summing each of k different payout amounts:

Chapter 20 Insurance and finance risk analysis modell~ng 5 0 7

This is often called the individual life model. It is rather laborious to determine for large n. An alternative
is to group insured people into the number nij who have similar probabilities q j of death and similar
benefit levels Bi :

where

De Pril(1986, 1989) determined a recursive formula that allows the exact computation of the aggregate
claim distribution under the assumption that the benefits are fixed values rather than random variables
and take integer multiples of some convenient base (e.g. $1000) with a maximum value M * base, i.e.
Bi = (1 . . . MI* base. Let n;, be the number of policies with benefit B; and probability of claim qj.
Then De Pril demonstrates that the probability p(x) that the aggregate benefit payout X will be equal
to x * base is given by the recursive formula

I
p(x) = X

xx

min[x,M]

Lx/il

i=l

k=l

p(x - ik)h(i, k)

for x = 1.2.3.. .

and

where

The formula has the benefit of being exact, but it is very computationally intensive. However, the
number of computations can usually be significantly reduced if one accepts ignoring small aggregate
costs to the insurer. Let K be a positive integer. Then the recursive formulae above are modified as
follows:

Dickson (2005) recommends using a value of 4 for K . The De Pril method can be seen as the
counterpart to Panjer's recursive method for the collective model. ModelRisk offers a set of functions
for implementing De Pril's method.

508

Risk Analysis

20.6.1 Compound Poisson approximation
The compound Poisson approximation assumes that the probability of payout for an individual policy
is fairly small - which is usually true, but has the advantage over the De Pril method in allowing that
the payout distribution is a random variable rather than a fixed amount.
Let n j be the number of policies with probability of claim qj. The number of payouts in this stratum
is therefore Binomial(nj, qj). If n j is large and q j is small, the binomial is well approximated by
a Poisson (n, * qj) = Poisson(h;) distribution (n0.31q < 0.47 is a good rule) where hi = n, * qj. The
additive property of the Poisson distribution tells us that the expected frequency for payouts over all
groups of lines of insurance is given by

and the total number of claims = Poisson(hal1). The probability that one of these claims, randomly
selected, comes from stratum j is given by

Let Fj(x) be the cumulative distribution function for the claim size of stratum j. The probability that
a random claim is less than or equal to some value x is therefore

Thus, we can consider the aggregate distribution for the total claims to have a frequency distribution
equal to Poisson(hdl) and a severity distribution given by F(x). ModelRisk offers several functions
related to this method that use Panjer recursive or FFT methods to determine the aggregate distribution
or descriptive statistics such as moments and percentiles.

20.6.2 Permanent life insurance
Here, the insured are covered for their entire life, providing the premiums continue to be paid, so
a payment will always be made and the policy accrues value over time. Some policies also allow the
flexibility of withdrawing cash from the policy. Thus, permanent life insurance is more of an investment
instrument for the policyholder with a final payout determined by the death of the insured. Determining
premiums requires simulating the possible lifetime of the insured and applying a discounted cashflow
calculation to the income (annual premiums) and costs (payment at death and administration costs over
the life of the policy).

Chapter 20 Insurance and finance risk analysis modelling

509

20.7 Accident lnsurance
An insurance company will provide cover for financial loss associated with accidents for a fixed period,
usually a year (for example: car, house, boat and fire insurance). The amount the insurer pays out is
a function of the number of accidents that will occur and the cost to the insurer of each accident.
The starting point is to assume that, for a given insured party, the accidents are unconnected to each
other and occur randomly in time, which satisfies the assumptions underpinning a Poisson process. The
frequency of accidents is then characterised by the expected frequency h over the insured period.
Insurance companies nearly always apply a retention or deductible D. This means that the insured
party pays the first part of the cost in an insured event up to D and the insurance company pays
(cost - D) thereafter, so a cost x and claim size y are related as follows:

Most insurance policies also apply a maximum limit L, so that, if the cost of the risk event is greater
than L D, the policy only pays out L. Then we have

+

y = IF(x - D < 0,O, IF(x - D > L, L, x - D))

(20.1)

So, for example, if x = Lognormal(l0, 7), D = 5 and L = 25, we get the claim distribution shown in
Figure 20.14.
Determining an aggregate distribution of claim sizes is simulation intensive, especially if the number
of claims can be large, but there are a couple of ways to speed up the simulation. Consider the following
problem. Historically we have observed 0.21 claims per insured party per year when the deductible was
8, and we wish to model the aggregate loss distribution for 3500 insured parties over a year with a new
deductible of 5.

Figure 20.14 Comparison of claim size vs risk event cost distribution. Vertical lines with markers represent
a probability mass.

5 10

Risk Analysis

The claims data we had were truncated (Section 10.3.3 describes how to fit a distribution to truncated
data). We'll assume that we were able to estimate the underlying risk event cost distribution to be the
Lognormal(l0, 7) from above. Thus, the expected rate of risk events was 141 - F(8)) times the rate of
claims, where F(x) is the cumulative probability for a Lognormal(l0, 7). The fraction of these events
that will incur a claim is now (1 - F(5)), so the new frequency of claims per insured party per year
will be 0.21 * (1 - F(5))l(1 - F(8)). From Figure 20.14 you can see that the claim size distribution
excluding zero is a lognormal distribution truncated at 5 and 30 and then shifted -5 units, i.e.
=VoseLognormal(lO, 7, , VoseXBounds(5,30), VoseShift(-5))
=RiskLognorm(lO, 7, RiskTrunc(5,30), Riskshift(-5))

in ModelRisk
in @RISK

We can use this distribution to simulate only claims rather than the entire risk event cost. ModelRisk
has a range of special functions that convert a distribution object into another object of the form of
Figure 20.14. For example

=VoseDeductObject(CostDistributionObject, deductible, maxlirnit, zeros)
The zero parameter is either TRUE or FALSE (or omitted). If FALSE, the DeductObject has no
probability mass at zero, i.e. it is the distribution of a claim size given that a claim has occurred. If
TRUE, the DeductObject has probability mass at zero, i.e. it is the distribution of a claim size given
that a risk event has occurred. The DeductObject can be used in all the usual ways, e.g. if we start with
the Lognormal(l0, 7) distribution from above and apply a deductible and limit:
A l : = VoseLognormalObject(l0,7)
A2: = VoseDeductObject(Al,5,25,TRUE)
The object in cell A2 can then be used in recursive and FFT aggregate methods, since these methods
discretise the individual loss distribution and can therefore take care of a discrete-continuous mixture.
Thus, we can use, for example

to simulate the cost of Poisson(2700) random claims, and

to calculate the 99th percentile of the aggregate cost of Poisson(2700) random claims.

20.7.1 Non-standard insurance policies
Insurance policies are becoming ever more flexible in their terms, and more complex to model as a
result. For example, we might have a policy with a deductible of 5, and a limit of 20 beyond which

Chapter 20 Insurance and finance risk analysis modelling

5 II

the insurer pays only half the damages. Using a cost distribution of Lognormal(31, 23) and an accident
frequency distribution of Delaporte(3, 5, 40), we can model this as follows:
A1 : = VoseLognormalObject(31,23)
A2: = VoseExpression("IF(#1>20, (#1 - 25)/2, IF(#1 < 5 , 0 , #I))", Al)
A3 (output): = VoseAggregateMC(VoseDelaporte(3,5,40),A2)
The VoseExpression function allows one a great deal of flexibility. The "#1" refers to the distribution
linked to cell Al. Each time the VoseExpression function is called, it will generate a new value from
the lognormal distribution and perform the calculation replacing "#I" with the generated value. The
Delaporte function will generate a value (call it n) from this distribution, and the AggregateMC function
will then call the VoseExpression function n times, adding as it goes along and returning the sum into
the spreadsheet.
The VoseExpression allows several random variables to take part in the calculation. For example

will model a cost that follows a Lognormal(20, 7) distribution with 30 % probability and zero with 70 %
probability, while

will model a cost that follows a (Lognormal(20,7)
and zero with 70 % probability.

+ VosePareto(4,7)) distribution with 30 % probability

20.8 Modelling a Correlated Insurance Portfolio
Imagine that you are an insurance company with several different policies. For each policy you have
the number of policy holders, the expected number of accidents per policy per year and the mean and
standard deviation of the cost of each accident, and each policy has its own deductible and limit. It is
a simple, though perhaps laborious, exercise to model the total payout associated with one policy and
to sum the aggregate payout using simulation. Now imagine that you feel there is likely to be some
correlation between these aggregate payouts: perhaps historic data have shown this to be the case. Using
simulation, we cannot correlate the aggregate payout distributions. However, we can include a correlation
if we use FFT methods to construct the aggregate loss distribution. The model of Figure 20.15 shows the
aggregate loss distribution of five different policies being correlated together via a Clayton(l0) copula
(Section 13.3.1). Note that the equations used in cells C10:C14 use 1 minus the Clayton copula values,
which will make the large aggregate claim values correlate more tightly than at the low end.

5 12

Risk Analysis

C10:C14
D10:D14
E10:E14

Formulae table
=VoseAggregateFFT(ElO,F10,,l-D10)
{=VoseCopulaMultiCiayton(lO))
=VosePoissonObject(C3'D3)

Figure 20.15 Simulating the loss distribution for a number of policies where the aggregate loss distribution
for policies is correlated in some fashion.

20.9 Modelling Extremes
Imagine that we have a reasonably large dataset of the impacts of natural disasters that an insurance
or reinsurance company covers. It is quite common to fit such a dataset, or at least the high-end tail
values, to a Pareto distribution, because this has a longer tail than any other distribution (excepting
curiosities like the Cauchy, slash and LCvy, which have infinite tails but are symmetric). An insurance
company will often run a stress test of a "worst-case" scenario where several really high impacts hit
the company within a certain period. So, for example, we might ask what could be the size of the
largest of 10000 impacts drawn from a fitted Pareto(5, 2) distribution modelling the impact of a risk in
$billion.
Order statistics tells us that the cumulative probability U of the largest of n samples drawn from a
continuous distribution will follow a Beta(n, 1) distribution (see Section 10.2.2). We can use this U
value to invert the cdf of the Pareto distribution. A Pareto(@,a) distribution has cdf

giving

Thus, we can directly generate what the value of the largest of 10000 values drawn from this
distribution might be in $billion as follows:

Chapter 20 Insurance and finance risk analysis modelling

A

1

B

I

5

C

I

D

H

I

Manual method
Simulate largest
9.426091362 $ billion
Largest with probability P
22.85693971 $ billion

I

1

E

I

F

G

1

ModelRisk methods

9
Simulate largest
Largest with probability P

3

14
2
2

1
7
2
2
20

I

10000

6
7

2
12

1 13

C4
G3
G4
Dl0
Dl1
G I 1:G16
H l 1:HI 6

14.823 $ billion
22.857 $ billion

Formuale table
=VoseParetoObject(C2,C3)
= d ( l -VoseBeta(n,1))"(lltheta)
= d ( l -BETAINV(P,n,l ))A(l/theta)
=VoseLargest(C4,n)
=VoseLargest(C4,n, P)
(=VoseLargestSet(C4,n)}
(=VoseSmallestSet(C4,n)}

Totals $billion

64.10318743

12.0008329

21

Figure 20.16 Determination of extreme values for 10000 independent random variables drawn from a
Pareto(5, 2) distribution.

One can also determine, for example, the value we are 95 % sure the largest of 10 000 risk impacts
will not exceed by simply finding the 95th percentile of the beta distribution and use that in the above
equation instead:

The same method can be applied to any distribution for which we have the inverse to the cdf. The
principle can also be extended to model the largest set of values, or smallest, as shown in the model of
Figure 20.16 which allows you to simulate the sum of the largest several possible values (in this case
six, since the array function has been input to cover six cells). ModelRisk can perform these extreme
calculations for all of its 70+ univariate distributions (Figure 20.17).

20.10 Premium Calculations
Imagine that we are insuring a policy holder against the total damage X that might be accrued in
automobile accidents over a year. The number of accidents the policyholder might have is modelled as
P61ya(0.26, 0.73). The damages incurred in any one accident are $Lognorma1(300, 50). The insurer has
to determine the premium to be charged.
The premium must be at least greater than the expected payout E [ X ] , otherwise, according to the
law of large numbers, in the long run the insurer will be ruined. The expected payout is the product
of the expected values of the P6lya and lognormal distributions, in this case = 0.1898 * $300 = $56.94.
The question is then: how much more should the premium be over the expected value? Actuaries

5 14

Risk Analysis

Figure 20.17 ModelRisk screen capture of the Function View for the VoseLargest array function in cells
G11:G16 from the model in Figure 20.16, showing the relationships between the marginal distributions for
the six largest-value variables.

have a variety of methods to determine the premium. Four of the most common methods, shown in
Figure 20.18, are:
1. Expected value principle. This calculates the premium in excess of E[X] as some fraction 6 of
E[X]:
Premium = (I

+ @)E[X], 8 > 0

Ignoring administration costs, 0 represents the return the insurer is getting over the expected capital
required, E[X], to cover the risk.
2. The Standard deviation principle. This calculates the premium in excess of E[X] as some multiple
a of the standard deviation of X:
Premium = E[X]

+ a a [ X ] , a>O

The problem with this principle is that, at an individual level, there is no consistency in the level
of risk the insurer is taking for the expected profit aa[X], since a has no consistent probabilistic
interpretation.
3. Esscherprinciple. The Esscher method calculates the ratio of the expected values of X ehx to ehx:
Premium =

E [xehx]
, h>O
E [ehx]

Chapter 20 Insurance and finance risk analysis modelling

5 15

Figure 20.18 Calculation of premiums under four different principles. The mean and variance of the
aggregate distribution, shown with a grey background, provide a reference point (e.g. the premiums must
exceed the mean of 56.94).

The principle gets its name from the Esscher transform which converts a density function from
f ( x ) to a * f ( x ) * exp[b * X I , where a and b are constants. It was introduced by Biihlmann (1980)
in an attempt to acknowledge that the premium price for an insurance policy is a function of the
market conditions in addition to the level of risk being covered. Wang (2003) gives a nice review.
4 . Risk adjusted o r PH principle. This is a special case of the proportional hazards premium principle
based on coherent risk measures (see, for example, Wang, 1996). The survival function ( 1 - F ( x ) )
of the aggregate distribution which lies on [O, 11 is transformed into another variable that also lies
on 10, 11:
Premium =

i""

[l - F ( x ) ] ' I pdx , > 1

min

where F ( x ) is the cumulative distribution function from the aggregate distribution.

Chapter 2 1

Microbial food safety risk assessment
In the field of microbial risk, modelling is called "risk assessment" rather than risk analysis, and I'll
follow that terminology in this chapter. Microbial risk assessment attempts to assess the human health
impact of the presence of bacteria on food. It is a big problem: I'm sure you, or someone close to you,
will have experienced food poisoning at some time. The usual effect is 2-lodays of vomit, cramps,
fever and/or diarrhoea, but it can be a lot more serious, from hospitalisation to permanent disability and
death. A considerable part of our company's work has been in microbial risk assessment. We teach an
advanced two-week course on modelling methods, and I have coauthored or edited a number of national
and international guidelines on microbial risk assessment.
Microbial risk assessment has been undertaken since the mid-1990s and built initially (with limited
success) on the development of nuclear and toxicological human health risk assessments. The modelling
techniques that are available now are reasonably advanced and generally based around some form of
Monte Carlo model. The major problems faced by a risk assessor are the availability of recent, targeted
data and adequately capturing the complexity of the real-world problem. The goal of the modelling is to
evaluate the health impact of changes to the system (and perhaps minimise the effect): e.g. more control
of animal waste, a new product entering the market, changing the regulatory requirements on testing
farms, food or animal food, changes to farm practices, educating people to be more careful with their
food, etc. The most common approach to microbial food safety has been to try to model the pathways
that bacteria take from the farm to the human ingestion event, figuring out along the way how many
bacteria there might be (probability distributions). The numbers of bacteria ingested are then translated
into a probability of infection, illness, severe illness and/or mortality through a dose-response model.
Put it all together and we get an estimate of the human impact of the bacterium.
To illustrate the level of complexity, consider the biggest foodborne illness risk in the developed
world - non-typhoidal salmonellosis, which you get from exposure to a group of bacteria known as
Salmonella (named after its discoverer Dr Daniel Salmon). For a random person to contract salmonellosis, he or she must have ingested (ate, drank, breathed in) one of these bacteria that then found a
suitable place in the body (the lower intestine) to set up home and multiply (colonise), which is the
infection. You need just one bacterium to become infected because it then multiplies in the human gut,
but if you eat something with thousands of bacteria on it you obviously have more chance of infection.
Usually an infected person will experience diarrhoea, fever and cramps which stop on their own after
6 days or so, but in severe cases a person can be hospitalised and can develop Reiter's syndrome, which
is really unpleasant and can last for years, or, if the infection spreads from the gut into the bloodstream,
it can attack the body's organs and result in death.
These bacteria grow naturally without effect in the intestines of animals, many of which we eat.
The major sources of human exposure are food related: eggs are generally the biggest one (Salmonella
enteritidis), but also chicken, beef, pork and related products. However, you can find it in vegetables
(think fertiliser), recreational waterways (think ducks) and household pets (don't give your dog uncooked
chicken). The bacteria are killed by heating them sufficiently - so well-cooked chicken meat can be less

5 18

Risk Analysis

risky than raw vegetables. In order for us to model a nation's exposure to Salmonella, we need to model
the number of bacteria from each source and how many survive to get into our gut. Let's just take
commercially raised chicken: eggs are produced by layer flocks and sent to farms that hatch the chicks
which are then sent to farms that raise the birds in flocks that are kept in sheds protected from the outside
world. In some countries the bedding is not changed through the life of the flock. Salmonella can enter
from infected hatching farms, wild birds or rodents entering the shed, infected feed, visitors, etc. At a
certain age (you don't want to know just how young) the flock is sent en masse to a slaughter plant
in trucks (which may be clean, but may not be). The chickens get nervous (they've never left the shed
before), the acidity in their stomach changes and the Salmonella multiply many times. The chickens
are then converted into supermarket-ready neat packages in a messy process that involves considerable
redistribution of bacteria between carcasses in a flock, and from one flock following another in the
slaughter plant. Then these neat packages take many forms: wings, breast, oven ready, chilled or frozen,
TV dinners. Scraps that don't make neat supermarket packages are converted into sausages and burgers.
They are sent to warehouses where they are redistributed. Freezing kills some bacteria (we say "are
attenuated") as a function of time and temperature, as does cooking. Some bacteria become dormant,
so when we test them we think they are dead (they won't grow), but they can revive over time and
start multiplying again, so there is some uncertainty about how to interpret the data for attenuation
experiments.
How about the bedding the flock lived on? We eat a lot of chickens and they produce a lot of waste
that the poultry farmer has to get rid of somehow. It gets cleaned out after the chickens have left and used
as fertiliser (at current prices, it would be worth about $30 per ton as a commercial fertiliser equivalent).
However, it can be worth a lot more - in some countries (the USA included) it is fed to cattle as it is
as nutritious as alfalfa hay (sceptical? - try googling "poultry litter cattle feed). Apparently, however,
it is not as palatable to the cow (which is not hard to believe), so they take some time to get used to it.
The bacteria in the litter are (hopefully) killed by deep stacking the litter and leaving it until it reaches
54 Celsius.
Measuring the number of bacteria in food is a tricky problem. The bacteria tend to grow very fast
(doubling every 20 minutes in good conditions is common) so they tend to be in small locations of
large numbers. The food doesn't smell or look bad, so we have to rely on random samples, which are
a very hit and miss affair. We count the bacteria in logs because that is the only way to make sense
of the huge variation in numbers one sees. Perhaps the most difficult part of the farm-to-fork sequence
(you may prefer the term "moo-to-loo") is the contribution played out in the domestic and commercial
kitchens that prepare the chicken. The chicken itself may be well cooked, but the fat and liquids that
come off the uncooked carcass can spread around and cross-contaminate other foods via hands, knives,
chopping boards, etc. Some researchers have produced mocked-up sterile kitchens and invited people
to prepare a meal, then investigated the contamination that occurs. It can be a lot - I used to have a
kitchen with a black granite top that showed the slightest grease spot - bad choice for a kitchen perhaps,
but it looked beautiful and provided an excellent demonstration of just how difficult it is to completely
clean a surface. Imagine trying to clean a porous wooden chopping board instead. Think about how
hard it is to wash raw chicken fat off your hands. Maybe, like me, you wipe your hands frequently on
an apron or towel (or your clothes) while you are cooking, offering another cross-contamination route. I
am sure that you can see that there are many possible exposure pathways and there is a lot of difficulty
in counting bacteria.
The exposure assessment portion of a microbial risk assessment requires development of a model
to represent the various pathways or scenarios that can occur as product moves through the farm to
consumer continuum. The possibilities are nearly endless. Modelling the myriad pathways through

Chapter 2 l Microb~alfood safety risk assessment

5 19

which a product can move is complex, and models must necessarily be simplified yet still adequately
represent reality. Generally, the different pathways are modelled in a summary form by using frequency
distributions to represent the possible scenarios.
The primary difficulty in aggregating the scenarios, however, is determining what proportion of the
real world each scenario should represent. The number of possible post-processing scenarios for even
a single product pathogen pair is large. If we include cross-contamination of other products, the effort
could become intractable. In spite of best efforts adequately to represent the scenarios in the model,
there can be other important scenarios that are missed simply because they haven't been imagined by
the risk managers or risk assessors.
There are two components in a microbial risk assessment that are rather particular to the field: growth
and attenuation models, which predict the changing numbers of bacteria as a function of environmental
conditions, and dose-response models, which translate the number of bacteria ingested into a probability
of illness. I'll explain them here because the thinking is quite interesting, even if you are not a food
safety specialist (and I promise there won't be anything else that puts you off your food). Then I
recommend you go to www.foodrisk.org/risk~assessments.cfm,
which gives you access to many risk
assessments that use these and other ideas. At WHO and FAO's websites you can also find far more
detailed explanations of these ideas, together with the data issues one faces.

2 1.1 Growth and Attenuation Models
In microbial food safety the dose measure is assigned a probability distribution to reflect the variation
in dose between exposure events. The probability distribution of a non-zero dose in microbial risk
assessment is usually modelled as being lognormal for a number of reasons:
Changes in microorganism concentrations across various processes in the farm-to-fork chain are
measured in log differences.
Processes are considered generally to have a multiplicative effect on the microorganism number or
concentration.
There are very many (even if perhaps not individually identified) events and factors contributing to
the growth and attenuation of microorganism numbers.
The central limit theorem states that, if we multiply a large number of positive random variables
together with no process dominant, the result is asymptotically lognormal.
For convenience and ease of interpretation, risk assessors work in log base 10, so the exposure dose
is of the form

The most common growth and attenuation models in current use are the exponential, square-root,
BElehrAdek, logistic, modified Gompertz and Baranyi-Roberts models. These models can be divided
into two mathematical groups: those that have memory and those that do not. "Memory" means that
the predicted growth in any time interval depends on the growth history that has already occurred.
In microbial food safety models we often encounter the situation where bacteria move in and out of
conditions (usually temperature) that are conducive to growth, and therefore we need to estimate the
effect of growth across the entire history.

520

Risk Analysis

The most common way to estimate the parameters of growth and attenuation models is through
controlled laboratory experiments that grow an organism in a specific medium and under specific environmental conditions. The number of microorganisms is recorded over time, and the various models
are fit to these data to derive the parameter values for this combination of microorganisdgrowth
mediudenvironmental conditions. Growth experiments essentially start with a medium nalve to the
organism and then monitor the gradual colonisation of that medium to saturation point. The log number
of organisms thus tends to follow an S-shaped (sigmoidal) function of time (Figure 21 . I ) . The sigmoidal
curve has a gradient (rate of increase in log bacteria numbers) that is a function of time or, equivalently,
of the number of bacteria already present once the lag phase is complete.
Interpretation of these curves requires some care. We can to some extent eliminate the difficulty of
scaling of the amount of the medium in which the bacteria are grown by considering surface density
instead of absolute numbers. However, by using these fitted growth models in risk assessments where
there is more than one opportunity for growth, we are effectively resetting the food medium to be nalve
of bacteria at each growth event and resetting the pathogen to have no memory of previous growth.
Thus, in order to be able to use sequential growth models, we either need some corrective mechanism
to compensate for already being part of the way along a growth curve or we need to accept a simpler
method with fewer assumptions.

2 1.1.1

Empirical data

Empirical data are often used in microbial food safety to model the effect of current operations on
pathogen numbers. For example, one might estimate the number of pathogenic organisms in the gut of
poultry before and after transportation from shed to slaughter plant to determine any amplification (see,
for example, Stern et al., 1995) or track pathogen levels through a slaughter plant (see, for example,
the data tabulated for Salmonella in poultry in Section 4.4 of FAOrWHO (2000)).
Experiments to enumerate levels of pathogenic organisms at farms and in slaughter plants struggle
with a variety of problems: the detection threshold of the test; the transfer rate to swabs or rinse

.-------------------------------------Log Nmax

C

C

3

-8

at
0
u
l

0
A
.

Log No

Lag time

Time

Figure 21.1 Schematic diagram of a growth curve with a lag time.

Chapter 2 1 Microbial food safety risk assessment

52 I

solution; the most probable number (MPN) algorithm used to interpret the observed counts; and the
representativeness of samples like swabs from skin, faecal samples, etc. Perhaps the greatest difficulties
in obtaining representative results are caused by the heterogeneity of pathogen distribution between
animals and over carcass surfaces and the several orders of magnitude of variability observed.
Slaughter plant processes that redistribute and attenuate pathogen load are highly complex and make
data collection and modelling extremely difficult. For example, Stem and Robach (2003) conducted poultry slaughter plant experiments, enumerating for individual birds the levels in their faeces preslaughter
and on their fully processed carcasses, and found no correlation between the two.

2 1.1.2 Growth and attenuation models without memory
Exponential growth model

Growth models for replicating microorganisms are based on the idea that each organism grows independently by division at a rate controlled by its environment. For environment-independent growth we
can write

which gives the exponential growth model

where No and N, are the number (or density) of microorganisms at times zero and t respectively,
and k is some non-negative constant reflecting the performance of a particular microorganism under
particular stable environmental conditions. This equation produces a growth curve of the form shown
in Figure 21.2.

L

.-E"

C
C

m

z
C
dm

m

No

Time t

Figure 21.2 Exponential growth.

522

Risk Analysis

In fact, this model only describes the expected value for N,. The stochastic model is known as the
Yule model (Taylor and Karlin, 1998), which has not yet been adopted in food safety modelling:

where NB is a negative binomial distribution with probability mass function

ModelRisk has functions that will fit a Yule time series to data andlor make a projection. There are
also options for doing the analysis in loglo.
Nt has mean p and variance V given by

Bdehradek growth model

Extensions to the exponential model can account for different environments. The full temperature
BElehrBdek environment-dependent growth model modifies k to be a function of temperature:

where b and c are fitting constants, T is temperature and Tminand Tmaxare the minimum and maximum
temperature at which growth can occur respectively. It is sometimes known as the square-root model
and leads to

Other versions take into account environmental factors such as pH and water activity. In general,
these models take the form
Nt = No exp(t .f (conditions))
Talung logs, we have

So, for example, taking logs of Equation (21.1) gives

Parameter b can be rescaled to be appropriate to a loglo model:

Chapter 2 l Microbial food safety risk assessment

523

The exponential and BElehrAdek models are memoryless, meaning that they take no account of how
long the microorganisms have already been growing, only how many there are at time t = 0. Thus, we
can break down t into two or more separate divisions and get the same result. For example, let
t =t1 + t z

The number of bacteria at time tl is given by

and the number of bacteria at time t2 is then given by

which is the same result as performing the model on t directly.
Attenuation models

Attenuation models for microorganisms are based on the idea (Stumbo, 1973) that each organism dies
or is inactivated independently and that the event follows a Poisson process whose rate is controlled by
the environmental conditions, which leads to the general exponential attenuation equation

where E [N,] is the expected number of CFU (colony-forming units, i.e. microorganisms that are capable
of multiplying to form a colony) at time t and No is the initial number of CFU prior to growth, and k
is a fixed positive value or a function of the environmental conditions; k-' is then the average life of
the pathogen in this environment. Taking logs, we have

This model only describes the expected value for N,. The stochastic model is known as the "pure
death" model (Taylor and Karlin, 1998) which can be written as

ModelRisk has functions that will fit a "pure d e a t h time series to data andlor make a projection.
There are also options for doing the analysis in loglo.
The distribution of N at any moment t has mean p and variance V given by

624

Risk Analysis

These equations give us a probabilistic way of analysing data on attenuation and death. Imagine that,
on average, a process decreases pathogen load by r logs:

Then the distribution for log[Nt] will be approximately1

By the pivotal method of classical statistics and the principle of a sufficient statistic, if we have
observed a log reduction of x, i.e.

then we can estimate r to be

The probability that all the pathogenic organisms will be killed by time t is the binomial probability
that Nt = 0, i.e.

and the time until all the pathogenic organisms are killed or attenuated is, from Poisson theory, the
random variable

(2 1.2)

t ( N , = 0) = Gamma

Gamma distributions are difficult to generate for large Nt, so we can use the normal distribution
approximation to Equation (21.2):2

As with exponential growth models, other versions take into account environmental factors such as
pH and water activity. For example, the Arrhenius-Davey environment-dependent inactivation model
has the following equation:

where a , b, c and d are constants, T is temperature and pH is the pH of the environment.
In general, these models take the form
Nt = No exp(-t. f (conditions))

' Using the normal approximation to the binomial: Binomial(n, p )

Normal(np, d
Using the normal approximation to the gamma: Gamma(a, p ) x Normal(ap, &p),

m

)

accurate to within at most 0.7 % for a>50.

Chapter 2 l Microbial food safety risk assessment

525

2 1.1.3 Growth and attenuation models with memory
Logistic growth and attenuation model

The logistic3 model dampens the rate of growth by using

where N, is the population size at t = oo. The concept behind this equation is that a population
naturally stabilises after time to its maximum sustainable size N, which will be a function of the
resources available to the population. The differential equation is a simple form of the Bernoulli equation
(Kreyszig, 1993) which, by setting N = No when t = 0, integrates to give
N, =

+ (N,

NO

No N,
- NO)exp(-kt)

or a more intuitive form of the equation

The equation can be fitted to data from laboratory growth studies that give estimates of k and the
maximum microorganism number (or density) N, for the particular environment and organism. The
function takes three forms: if k = 0 there is no change in N with time; if No > N,, then N, will
decrease (attenuation model) in a sigmoidal fashion towards N,; and, of most relevance to microbial
food safety, if No < N,, then Nt will increase (growth model) in a sigmoidal fashion towards N,.
A problem in using Equation (21.3) in microbial risk assessment is the non-linear appearance of No.
This means that, if we split time into two components, we will not get the same answer, i.e.

where

The reason for this is that the equation assumes that the starting position (t = 0) equates to the
beginning of the sigmoidal curve. It is therefore inappropriate to use Equation (21.3) if one is at some
point in time significantly into the growth (or attenuation) period.
The term logistic has no relation to logistic regression. In human population modelling it is called the Verhulst model after the
nineteenth-century Belgian mathematician Pierre Verhulst. In 1840 he predicted what the US population would be in 1940 and was
less than 1 % off, and did just as well with estimating the 1994 Belgian population (excluding immigration). However, his success
with human populations is no comfort to the predictive microbiologist, as Verhulst was dealing with a single environment (a country)
and could derive a logistic growth curve to match historic population data and thus predict future population size. In predictive
microbiology we are modelling growth on a contaminated food, and we don't have the historic data to know where on the logistic
growth curve we are.

526

R~skAnalys~s

The same problem applies to almost all sigmoidal growth models, which include the Gompertz, modified Gompertz and Baranyi-Roberts models which are in frequent use in food safety risk assessments.
The van Boekel attenuation model is an exception.
Van Boekel attenuation model

Van Boekel (2002) proposed an environment-independent Weibull attenuation model that has memory
of the following form

where S(t) is the Weibull probability of surviving after time t, and
which he gives some loose physical interpretation.
The model can be restated as

a!

and /3 are fitting parameters to

or, more precisely, as a binomial variable:
Nt = Binomial (NO, exp

(- ($)a))

He fitted his model to a large number of datasets which performed better than the simpler Nt =
No exp(-kt). Equation (21.4) is linear in No, which means that, with some elementary algebra, time
can be divided into two or more intervals and the results still hold:

N, = N~exp

(- ($1")

= Nt exp (lp - ('l

+

B"

t2)a)

Thus, although this model has memory of previous attenuation, it can still be used for subsequent
attenuation events.
Nt has mean p and variance V given by

.

= NO ex*

V = No exp

((- ($)")
($)a)

(1 - exp

(- ($)"))

The probability that all the pathogenic organisms will be killed by time t is the binomial probability
that N, = 0, i.e.

Chapter 2 l Microbial food safety risk assessment

527

Equation (21.5) can be rearranged to generate values for the distribution of time until all pathogenic
organisms are killed:

where U is a Uniform(0, 1) random variable.

2 1.2 Dose-Response Models
There are four dose-response equations in common use that are based on theory conforming to WHO
guidelines (FAO/WHO, 2003) of no threshold (i.e. it only takes one organism to cause an infection) and
are given in Table 2 1.1.
The first three of these models can be viewed as stemming from a simple binomial equation for the
probability of infection:

where p is the probability that a single pathogenic organism causes infection, and D is the number of
pathogenic organisms in the exposure event. Thus, the equation for P(inf) is the binomial probability
that at least one of the D pathogenic organisms succeeds in infecting the exposed person, i.e. 1 minus
the probability that zero organisms will succeed in infecting. Implicit in this equation are two important
assumptions:
The exposure to the D organisms occurs in a single event.
The organisms all have the same probability of causing infection independently of the number of
organisms D.

2 1.2.1 The exponential model
The exponential dose-response model assumes that the received dose is Poisson distributed with mean
h i.e.

Table 21.1 The four most common microbial dose-response
models.

D-R model

Dose measure

P(effect)

Exponential

Mean dose h

= 1 - exp(-hp)

Beta-Poisson

Mean dose h

xl -

Beta-binomial

Actual dose D

= 1-

Weibull-gamma

Actual dose D

= 1-

r ( D + B ) r b + B)
r(a

+ B + D)r(B)

528

Risk Analysis

which simplifies to

We can think of hp as the expected number of infections a person will have from exposure to this
dose, and the equation for P(inf1h) is then just the Poisson probability of at least one infection. The
assumption of a Poisson-distributed dose is quite appropriate for modelling bacteria, viruses and cysts
in water, for example, if we can accept that these organisms don't clump together.
Some authors have reinterpreted Equation (21.7) to be

where r is the same probability p but the Poisson mean h has been replaced by the actual dose D
received, which is inconsistent with the underlying theory. The potential problem is that Equation (21.8)
will add additional randomness to the dose by wrapping a Poisson distribution around D, which is inappropriate if D has been determined as an actual dose received. A Poisson h distribution has mean h and
standard deviation f i .Thus, for a Poisson-distributed dose with large h the actual dose received will
be very close to A; for example, for h = 1E6 organisms the actual dose received lies with >99 % probability between 0.9997E6 and 1.0003E6 (Figure 21.3), and using the approximation of Equation (21.8)
adds little additional randomness.

Figure 21.3 A Poisson(1000 000) distribution overlaid with a Norrnal(1000 000, 1000).

Cha~ter2 l Microbial food safety risk assessment

529

Poisson (5)

Figure 21.4

A Poisson(5)distribution.

However, for low values of D this is not the case. For example, if D = 5 organisms, Equation (21.7)
adds a probability distribution around that value with a 99 % probability range of 1- 11 organisms, and
includes a 0.7 % probability that the dose is actually zero (Figure 21.4).

2 1.2.2 The beta-Poisson model
The beta-Poisson dose-response model assumes that the received dose is Poisson distributed and the
probability of an organism infecting an individual is beta distributed:

Using the same principle as Equation (21.7), this can be rewritten as

The beta distribution has a mean of

Integrating out the effect of the beta random variable in Equation (21.9) produces an equation for
P(inf1h) that involves the Kummer confluent hypergeometric function which is difficult to evaluate

530

Risk Analysis

numerically. However, there is a good approximation available for Equation (21.9) if p is small that, from
Equation (21.10), gives the restriction that a << B (Furumoto and Mickey, 1967; Teunis and Havelaar,
2000):

There is some confusion about what variation is being modelled by the Beta(a, B) distribution.
Equation (21.9) illustrates that to calculate P(inf1k) we must first select a random value p from the beta
distribution and another value D from the Poisson distribution and perform the calculation 1 - (1 - p) D.
The scenario being modelled is that in a random exposure event one particular value of p applies, i.e.
that each of the D organisms has the same probability p of causing infection to the human involved in
this exposure event. Allowing p to vary with the beta distribution means that we are acknowledging that
some people will be more susceptible (higher p) than others to infection andlor that some doses will
consist of organisms that are more infective than others. The model does not include the possibility that
the dose will consist of organisms of varying infectiveness. It also cannot distinguish between variations
in human susceptibility and organism infectiveness.
The beta-Poisson model is frequently fitted to feeding trial data. In these trials, doses are prepared
by making suspensions of a pathogenic organism at varying concentrations, and samples of precisely
measured volumes drawn from these suspensions are administered to the participants. Thus, it is reasonable to assume that the dose given to each participant will be Poisson distributed. However, each
suspension is usually produced by first growing a batch of organisms from one colony, which means
that there is little or no variation in the infectiveness across the population of organisms in the doses
administered. The participants in feeding trials are generally healthy young males, and certainly never
knowingly include anyone pregnant, sick or immuno-compromised. Therefore, the a and B beta-Poisson
parameters fitted to the results of a feeding trial can only hope to describe the effect of the variation
in susceptibility of healthy young individuals on the variation in p , and for organisms drawn from a
single colony. Use of the estimated values for a and B in risk assessments means that one must accept
the extrapolation to the much more widely varying human and pathogen populations.

2 1.2.3 The beta-binomial model
The beta-binomial dose-response model assumes, like the beta-Poisson model, that the probability
of an organism infecting an individual is beta distributed but, unlike the beta-Poisson model, that the
received dose is known, i.e.

There exists a beta-binomial distribution (Section 111.7.1) that parallels this model:

The beta-binomial distribution has the probability mass function

Chapter 2 1 M~crobialfood safety risk assessment

53 1

where r ( . ) is a gamma function. Following the same thinking as behind Equation (21.6), the probability
of infection is

Some of the gamma function values can extend beyond the range a computer can handle, and mathematical programs often offer the facility to calculate the natural log of a gamma function, so the
following formula is more useful in practice:

The beta-binomial model uses the same parameters a! and B as the beta-Poisson model. If a
beta-Poisson model has been fitted to data (usually feeding trial data) for a risk assessment, and
the risk assessment model is calculating (simulating) the actual exposure dose rather than the mean of
a Poisson-distributed dose, then the beta-binomial model is the most appropriate to use with the a and
p values of the fitted beta-Poisson model.

2 1.2.4 The Weibull-gamma model
The Weibull-gamma dose-response model begins by assuming that the probability of an organism
infecting an individual can be described by the cumulative distribution function of a Weibull distribution:
P (inf 1 D) = 1 - exp(-a D ~ )

(21.14)

A Weibull distribution is commonly used in reliability theory to describe the time until a machine
or component fails, where the instantaneous failure rate z(t) (the probability of failure in some small
increment of time given survival up to that moment) is a function of time of the form

Equation (21.15) describes a system with memory, meaning that the instantaneous failure rate changes
over the lifetime of the machine. The parameter b is commonly greater than 1 in reliability modelling,
signalling that a component "wears out" over time, i.e. the component becomes more likely to fail. If
b = 1. the formula reduces to

so z(t) becomes independent of time. In other words, the system becomes memoryless - the component
has no greater probability of failing at any increment of time, given it has not already failed, than at any
other. The parameter a effectively becomes the probability that the system will fail in an incremental
unit of time.
In the dose-response model, time t is replaced by the dose D and the failure of a component becomes
the failure of the immune system to combat the exposure. If we think of the organisms in a dose D
arriving sequentially, then P(inf1 D) is the probability of a person not getting infected until the Dth
organism arrives. The parameter a becomes, to a first approximation, the probability that an organism
will independently infect a person. The parameter b can now be thought of as describing how rapidly

the immune system is "wearing out". The larger the value of b, the quicker the immune system is
overcome: b = 1 would mean that the immune system is "memoryless", in other words it is equally
capable of coping with the last organism in the dose as it is of coping with the first; values of b less
than 1 mean that the system becomes progressively more capable of coping with the organisms as they
arrive. We can also switch round the interpretation to say that if b > 1 the organisms "work together",
if b < 1 they work against each other and if b = 1 they work independently.
In the Weibull-gamma model, the parameter a is replaced by a variable Gamma(ol, ,B) to describe the
variation in host susceptibility and organism infectiveness, in a similar fashion to the beta distribution
for the beta-Poisson and beta-binomial models. Thus, Equation (21.14) becomes
P(inf(D) = 1 - exp(-Gamma(ol, j3) * lib)
which, after integrating over the gamma density, simplifies to

The Weibull-gamma dose-response model has three parameters, which gives it greater flexibility
to fit data, but at the expense of greater uncertainty. Equation (21.11) for the beta-Poisson model and
Equation (21.16) for the Weibull-gamma model take similar forms, although the conceptual models
behind them are quite different. It is often stated that if b = 1 the Weibull-gamma reduces to the betaPoisson approximation model of Equation (21.1 1). This isn't strictly true unless A is large enough that
one can ignore the approximation of setting A = D discussed above.

2 1.3 I s Monte Carlo Simulation the Right Approach?
The key tool of the microbial risk assessor in constructing foodborne pathogen risk assessment models is
Monte Carlo simulation (MC), using products like Analytica, Crystal Ball, @RISK, Simul8 and bespoke
Visual Basic applications, among others. MC is an excellent tool for constructing a probabilistic model
of almost any desired complexity. It requires relatively little probability mathematics and the models
can be presented in an intuitive manner. It has one major drawback, however. MC requires that all the
parameters be quantitatively determined (with uncertainty if applicable) first, and then the model is run
to make a projection of possible observations. This means that, for example, we must construct a model
to predict the number of human illnesses in the population on the basis of contamination prevalence and
load, tracked through the food production-to-consumption chain, and then convert to a human health
impact through some dose-response model. However, we will often also have some estimate of what
that human health impact currently actually is from data from healthcare providers. In Monte Car10
simulation, if these two estimates do not match we are somewhat stuck, and the usual approach is to
adjust the dose-response function to make them match. This is not a statistically rigorous approach.
A superior method is to construct a Markov chain Monte Carlo (MCMC) model; currently the most
commonly used environment for this purpose is the freeware WinBUGS.
MCMC models can be constructed in a similar fashion to Monte Carlo models but offer the advantage
that any data available at a stochastic node of the model can be incorporated into the model. The model
then produces a Bayesian revision of the system parameter estimates. Models of possible interventions
can be run in parallel that then estimate the effect of these changes. The great advantage is the ability to

!

Chapter 2 l Microbial food safety risk assessment

53 3

incorporate all available information in a statistically consistent fashion. The major problems in implementing an MCMC approach are: there aren't any commercial MCMC software available - WinBUGS
is great but it is pretty difficult to use; and the computational intensity needed for MCMC modelling
means that we could not use models of the level of complexity that is currently standard.
I have often found that the results and behaviour of a complex model can frequently be predicted to
match a far simpler model, and that in turn means that we could use the MCMC method to write the
model and estimate its parameters. In the next section I present some simplifications.

2 1.4 Some Model Simplifications
2 1.4.1 Sporadic illnesses
If a population is consuming V units of food items (servings) of which a fraction p are contaminated
with some distribution with probability mass function f (D) of the number of pathogenic organisms D
of concern at the moment of exposure, the expected number of infections h can be estimated as
hinf % Vp P(inf lexposure)

(21.17)

where
00

P (D) P (inf ID)

P (inf exposure) =
D= 1

If the infections are sporadic, i.e. they occur independently of each other, and the dose distribution and the distribution of levels of susceptibility are constant across the exposed population, then
P(inf lexposure) can be considered as constant and the actual number of infections will be Poisson
distributed:
Infections = Poisson(hinf)
If the probability t of illness of a certain type, or death (given infection has occurred), is independent
of the initial dose (WHO'S recommended default) (FAOIWHO, 2003), then Equation (21.17) becomes
hill = t VpP (inf [exposure)

(21.18)

If the exposure events are independent, and we assume that just one person becomes ill per incident,
then h can again be interpreted as the mean for a Poisson distribution:
Illnesses = P o ~ s s o ~ ( ~ ~ ~ ~ )
Hald et al. (2004) used a WinBUGS model employing this method to determine the fraction of
salmonellosis cases that could be attributed to various food sources.

534

R~skAnalysis

2 1.4.2 Reduced prevalence
If contamination prevalence were to be reduced from p to q and the contaminated dose distribution
were to remain constant for those food items still contaminated (perhaps by reducing the number of
farms, herds, flocks or sheds that were infected), we could say the human health benefit was
Infections avoided

Poisson[V(p - q ) P (inf lexposure)]

If we already had a good estimate of the current human infections, and thus hinf,we could say
Infections saved = Poisson

[(

;>I

1 - - hinf

Elimination of the P(inf lexposure) means that we no longer need to have information on the dose
distribution or the dose-response function. The same logic applies equally to illnesses. The Vose-FDA
model (Bartholomew et al., 2005) is an example of the use of this approach.

2 1.4.3 Low-infectivity dose
If the exposure pathways lead to pathogen numbers at exposure that always have a small probability of
infecting a person, then, no matter which dose-response function is used, the probability of infection
will follow an approximately linear function of dose:

If the probability T of illness of a certain type, or death (given infection has occurred), is independent
of the initial dose, then we get

For an exposure with dose distribution with probability mass function f (D), we have
W

P (illlexposure)

f (D)k D t =

FZ

kar

D= 1

and for a volume V of product consumed that has a prevalence p of food items that are contaminated
we can write

again with the interpretation
Illnesses = Poisson(hill)

Chapter 2 1 Microbial food safety r~skassessment

535

Thus, without loss of generality, for pathogens causing sporadic infections where the probability of
infection is low for all exposures, we can state that the expected number of illnesses is a linear function
or the total
of both the prevalence of contamination p and the mean dose on a contaminated item
number of pathogenic organisms to which the population is exposed pB.
I have used this approach to help model the effects of oyster and clam contamination with Vibrio
parahaemolyticus when very few data were available.
The linearity of this model is not to be confused with the often criticised low-dose linear extrapolation
used by the EPA (United States Environmental Protection Agency) in toxicological risk assessments.
The EPA approach is to consider the lowest dose D at which a probability of an affect is observable P ,
then to extrapolate the dose-response curve with a straight line from (D, P ) down to (0, 0), which it
is generally agreed gives an exaggeratedly high conservative estimate of low-dose probability of affect
leading to very low tolerance of environmental exposure. The difference between these approaches lies
in the problem of getting an observed affect in toxicological exposure: an unrealistically high dose
(relative to real-world exposures) is needed to observe an affect in experimental exposures. Experiments
are needed because, when a person is presented in the real world with some toxicological symptoms
(e.g. cancer), it is rarely possible to attribute the case to exposure to a particular compound, or indeed
the level of exposure which may be chronic and aggregated with exposure to other compounds. In
microbial food safety we are largely relieved of this problem: the pathogenic organisms can often be
cultured, the exposure is usually a single event rather than chronic and current wisdom suggests that
microorganisms infect a person individually rather than through repeated exposure (chronic), allowing
the use of binomial mathematics (Equation (21.6) above).

Chapter 22

Animal import risk assessment
This chapter and Chapter 21 have evolved from a course we have run many times in various forms
entitled "Advanced Quantitative Risk Analysis for Animal Health and Food Safety Professionals". I very
much enjoy teaching this particular course because the field requires that one models both uncertainty
and variability explicitly. The problems that one faces in this area are very commonly modelled with
combinations of binomial, Poisson and hypergeometric processes. It is therefore necessary that one has
a good understanding of most of the material presented so far in this book in order to be able to perform
a competent assessment of the risks in question. This chapter can therefore be seen as a good revision
of the material I have discussed, and will also hopefully illustrate how many of the techniques presented
can be used together. The rather mathematical basis for these assessments is in complete contrast to the
analysis of projects in Chapter 19, and I hope that, between that chapter and this one, the reader will
benefit from seeing two such diverse approaches to risk assessment.
Animal import risk assessments are concerned with the risk of introducing a disease into a country or
state through the importation of animals or their products. Food safety risk assessments are concerned
with the risk to a population or subpopulation of infection, morbidity (illness) or mortality from consuming a product contaminated with either some toxin or some microorganism. We will look at certain
animal import problems first, and then consider a few problems associated with microbiological food
safety.
Animal import risk assessment is becoming increasingly important with the removal of world trade
bamers and the increasing volume of trade between countries. On the one hand, consumers have
increased choice in the food that they consume and farmers have greater opportunity to improve their
livestock through importing genetic material. On the other hand, the opening of a nation's borders
to foreign produce brings with it an increased risk of disease introduction, usually to the livestock
of that nation or the native fauna. The benefits of importing have to be balanced against the associated risks. This is not always easy to do, of course. Very often, the benefits accrue to certain
people and the risks to others, which can lead to political conundrums. The World Trade Organisation (WTO) has encouraged nations to base their assessments on standards and guidelines developed by
WTO member governments in other international organisations. These agencies are the Office Inters
for animal health and the International Plant Protection Convention
national des ~ ~ i z o o t i e(OIE)
(IPPC) for plant health. WTO nations have signed an agreement stating that they will open their
borders to the produce of others and are committed to removing any trade barriers. The caveat in
that agreement is that a nation is at liberty to ban the import of a product from another nation if
that product presents a risk to the importing nation. Now, a product may present a risk to one nation
but none to another. For example, importing pig meat on to some Pacific island may not present
a risk if there are no live pigs on that island to catch any disease that is present in the meat. On
the other hand, the same meat could easily present a significant risk to a country like Denmark
that has pigs everywhere. Thus, a country that bans the importation of a product on the basis that
it presents a significant risk must be prepared to demonstrate the rationale behind the ban. It could

not, for example, allow in product A but not product B where product A logically presents a higher
risk. In other words, the banning nation must be prepared to demonstrate consistency in its importing
policies. Quantitative risk assessment can often provide such demonstration of consistency, and the
OIE, Codex Alimentarius and IPPC continue to discuss and develop risk assessment techniques. The
OIE has published (OIE, 2004) a handbook on quantitative import risk analysis that is largely drawn
from the second edition of this book and the material from our courses, but also includes some more
material.
The agricultural ministries of many nations are constantly being bombarded with requests to import
products into the country. Many of the requests can be processed very quickly because the risk of disease
introduction is, to the experienced regulator, very obviously too large to be acceptable or negligible. One
can often quickly gauge the size of the risk by comparing the product with another similar product for
which a full quantitative analysis has been done. For products whose risk lies somewhere between these
two extremes, one may devise an import protocol that ensures the risk is reduced to an acceptable level
at the minimum cost. Protocols usually involve a combination of plant and farm inspection, quarantining
and testing for animal imports, and specifications of product source, quality controls and processing for
animal product imports. Quantitative risk assessment provides a method for determining the remaining
risk when adopting various protocols, and therefore which protocols are the least trade restrictive and
most effective.
I have talked about the "risk" to an importing country in a very general way. A "risk has been
defined (Kaplan and Garrick, 1981) as the triplet {scenario, probability, impact], where the "scenario"
is the event or series of events that leads to the adverse effect, the "probability" is the likelihood of
that adverse event occurring and the "impact" is the magnitude of the adverse event. At the moment,
quantitative risk assessments in the animal import world mostly concern themselves with the first two
of this triplet because of the political and economic pressures associated with any disease introduction,
no matter how small the actual impact (number of infected animals, for example) on the importing
country. This is a little unfortunate, since any truly rational management of risk must take into account
both probability and impact.
Before proceeding any further, it will be helpful for the reader unfamiliar with this area to learn about
a few basic concepts of animal diseases and their detection and a little bit of terminology. The reader
familiar with this area will note that I have left out a number of terms for simplicity.
True prevalence, p. The proportion of animals in a group or population that are infected with a
particular disease.
Sensitivity, Se. The probability that an animal will test positive given that it is infected. This conditional probability is a measure of the quality of the test: the closer the test sensitivity is to unity,
the better the test.
Specijicity, Sp. The probability that an animal will test negative given that it is not infected. This
conditional probability is also a measure of the quality of the test: the closer the test specificity is
to unity, the better the test.
Pathogen. The microorganism causing infection.
Infection. The establishment of the microorganism in an animal.
Disease. The physiological effect in the infected animal. The animal may be infected but not diseased,
i.e. show no symptoms.
Morbidity. Exhibiting disease.

Chapter 22 Animal import risk assessment

539

22.1 Testing for an lnfected Animal
This section looks at various different scenarios that frequently occur in animal import risk analyses
and the questions that are posed. We start with the simplest cases and build up the complexity. Some
formulae are specific to this field and are derived here for completeness and because the derivations
show interesting, practical examples of the algebra of probability theory.

22. I. I Testing a single animal
Problem 22.1: An animal has been randomly selected from a herd or population with known
(usually estimated) disease prevalence p . The animal is tested with a test having sensitivity Se and
specificity Sp. (i) If the animal tests positive, what is the probability it is actually infected? (ii) If
it tests negative, what is the probability it is actually infected?
Solution: Figure 22.1 illustrates the four possible scenarios when testing an animal. (i) If the animal tests
positive, either path A or path C must have occurred. Thus, from Bayes' theorem (see Section 6.3.5),
the probability that the animal is infected, given it tested positive, is given by

(ii) Similarly, the probability that the animal is infected, given it tested negative, is given by
P (inf 1 -) =

p ( l - Se)
P(B)
P(B)+P(D)
p(l-Se)+(l-p)Sp

rxziiq
1 health status 1,
lnfected
(1-P)

F]
Veterinary

Figure 22.1 The four scenarios when testing an animal for infection.

+

Positive

A

Negative
1-Se

B

540

Risk Analysis

22.1.2 Testing a group of animals
Usually, animals are imported in groups, rather than individually. Live animals, kept together during
transportation or during quarantine, will often spread a pathogen among the group if one or more are
infected, and the level of infection often reaches a stable percentage of the group if it is large enough.

Problem 22.2: A group of n animals are to be imported, of which s are infected. All animals are
tested with a test having sensitivity Se and specificity Sp. (i) How many will test positive? (ii) What
is the probability that all animals will test negative?
Solution: Of the s infected animals, Binomial(s, Se) will test positive. Similarly, of the (n - s) animals
not infected, Binomial(n - s, 1 - Sp) will test positive. (i) The total number of positives is therefore Binomial(s, Se) Binomial(n - s , 1 - Sp). (ii) The probability that all animals will test negative
P(al1-) is the probability that all infected animals will test negative, i.e. (1 - Se)$, multiplied by the
Thus, P(al1-) = (1 - s ~ ) " s ~ ( ~ - " ) .
probability that all animals not infected will test negative, i.e. s~("-")
A common error in this type of problem is to argue the following. Each of the n animals has a
probability p = s / n of being infected and a probability Se that it would then test positive. Similarly,
each animal has a probability (1 - p) of not being infected and a probability (1 - Sp) that it would
then test positive. The number of positives is thus Binomial(n,pSe) Binomial(n, (1 - p ) ( l - Sp)).
The calculation is incorrect since it assigns independence between the health status of each randomly
selected animal as well as the distribution of the number of infected animals and the number of uninfected
animals, where in fact they must always add up to n. A useful check is to look at the range of
possible values this formula would generate: each of the binomial distributions has a maximum of n,
so their sum gives a maximum of 2n positives - clearly not a possible scenario since there are only n
animals. +

+

+

Problem 22.3: A group of n animals are all tested with a test having sensitivity Se and specificity
Sp. If s animals test positive, how many in the group are actually infected?
Solution: This problem is looking from the opposite direction to Problem 22.2. In that problem, one
was estimating how many positives there would be from a group with a certain rate of infection. Here
we are estimating how many infected animals there were in the group given the observed number of
positives.
There is actually a certain number x of animals that are truly infected in the group of n , we just don't
know what that number is. Thus, we need to determine the distribution of our uncertainty about x. This
problem lends itself to a Bayesian approach. With no better information, we can conservatively assume
a priori that x is equally likely to be any whole number between 0 and n , then determine the likelihood
function which is a combination of two binomial functions in the same way as Problem 22.2.
The solution to this problem is algebraically a little inelegant and is therefore more readily illustrated
by assigning some numbers to the model parameters. Let n = 50, Se = 0.8, Sp = 0.9 and s = 25.
Figure 22.2 illustrates the spreadsheet model that determines our uncertainty about x for these parameter
values, and Figure 22.3 shows the posterior distribution for x.
It is rather interesting to see what happens if we change some of the parameter values. Figure 22.4
illustrates four examples: I leave as an exercise for you to work out why the graphs take the form they
do for these parameter values.
As an aside, if s = 0 the data would lead us to question whether our parameter values were correct.
The probability of observing no positives (s = 0) would be greatest if there were no infected animals
in the group, in which case, using our original parameter values, the probability that s = 0 is given

Chapter 22 Animal import nsk assessment

-.
-

--

1
2
3
4
5
6
7
8
9
10
58
59
60
61

AT

-

-

7

B

I c ]

n
Se
SP

50
0.8
0.9
12

s
Actually
Infected
x
0
1
49
50

Prior
1
1
1
1

I

D

0
2.2E-03
3.7E-04
0
0

1

~ A B I A C IAD

E

1.427
Likelihood
Posterior
True positives
distribution
1
24
25
2.2E-03
0
0
0
4.3E-03
0
0
4.6E-03
7.9E-17
O.OE+OO O.OE+OO
0
2.3E-17
0
O.OE+OO
0

541

IAF

AE

Normalised
posterior
1.6E-03
3.2E-03
5.5E-17
1.6E-17

7

-

'62
7

63
64
65
66
67
68
69

Figure 22.2

B9:B59
D8:ACE
D9:AC59
AD9:AD59
AD6
AE9:AE59

Formulae table
{0,1,2,...,49,501
{0,1,2 ,...,24,251
=IF(OR(D$8>$B9,D$8>s,n-$B9O infected n all - ve) =

(p(1 - Se))" (Sp(1 - p))(n-x)
x=l

Now, the binomial theorem (Section 6.3.4) states that

Comparing this with Equation (22.1), we can see that Equation (22.1) can be simplified to
P(>O infected n all - ve) = (p(l - Se)

+ Sp(l - p))" - (Sp(1 - p))"

Chapter 22 Animal import risk assessment

555

(ii,b) Each animal is accepted or rejected individually, so we can look at the fate of an individual
animal first and then extrapolate to get the required answer. The probability that an animal is infected
and escapes detection is p ( l - Se). The probability that this does not happen (i.e. the probability that
an animal is not infected or is infected but is detected) is then (1 - p ( l - Se)}. The probability that
this happens to all animals is then (1 - p ( l - Se)}", and so, finally, the probability that at least one
animal is infected but not detected and therefore not rejected is

We could also have arrived at the same equation by taking the summation approach of (ii,a).
The probability that there will be x infected animals (where 0 5 x 5 n) is given by the binomial
probability mass function

The probability that one fails to detect all x animals is 1 - Sex, so the probability that there are one
or more remaining infected animals in the consignment is

Again, using the binomial theorem, this reduces to
P(>O infected fl all - ve) = [ l - (1 - p)"]

-

[(l - p

+ pSe)" - (1

-

p)n]

= 1 - (1 - p ( l - Se))"

Clearly, the probability of at least one infected animal entering undetected in a consignment is greater if
animals are rejected individually when they test positively than if the whole group is rejected when there
is at least one positive test, because we have the opportunity of fortuitously rejecting an infected animal
.
22.12 illustrates
that has tested negative. Thus, 1 - (1 - p ( l - Se))" > (1 - pSe)" - (1 - P ) ~ Figure
the relationship between these two equations for n = 20 and varying prevalence p and sensitivity Se.
By solving the following equation for p

we see that the probability that at least one animal will be infected in a group when the animals are
rejected as a group is a maximum when the prevalence is given by

556

Risk Analys~s

-

-

Q-fl-EI--D-C1--D-U-U- A -

-'3--8---R--%

>-O--O

-Cr
0'

-0c

0

0.1

0.2

0.3

0.4

-0'

-

-

-0
0-

-0-oc-='Q
-

Reject animals individually, Se = 0.6
Reject animals individually, Se = 0.8
- o - Reject animals individually, Se = 0.95
-Reject
animals as a group, Se = 0.6
+Reject
animals as a group, Se = 0.8
-Reject
animals as a group, Se = 0.95
- u-

- -a -

0.5
0.6
Prevalence

0.7

0.8

0.9

1

Figure 22.12 Relationship between probability of an infected group when the animals are rejected individually or as a group (n = 20,p and Se varying).

An importing country is not usually worried about the infected consignments it rejects before importation, but is more concerned about the probability that a consignment that has been accepted is infected.
This requires a simple Bayesian revision, as illustrated in the next problem.
Problem 22.9: An animal population has prevalence p. You are to import consignments of size n
from this population (n is much smaller than the population size). If all animals are tested with
a test having sensitivity Se and specificity Sp, what is the probability that a consignment that has
passed the test and therefore has been allowed entry into the country is infected if (a) one or more
positive tests result in the rejection of the whole group and (b) positive tests result in the rejection
of only those animals that tested positive.
Solution: The reader is left to prove the following results:

+

( ~ (1 Se) Sp(1 - p))" - (Sp(1 - p))"
( ~ (1 Se) Sp(1 - P ) ) ~
1 - (pSe (1 - p))"
(b) P(infected1tested - ve) =
1 - (pSe (1 - p)(l - Sp))"
(a) P(infected1tested - ve) =

+

+

+

+

22.4 Confidence of Detecting an Infected Group
Veterinarians are often concerned with the number of animals in a group that need to be tested, given a
certain test, in order to be reasonably certain that one will detect infection in the group if it is infected
with the pathogen in question. Provided that we have a strong belief of what the prevalence within the
group will be, if infected, we can determine the number of animals to test. We must first of all state
what we mean by being "reasonably certain" that we will detect the infection by saying that there is

Chapter 22 Animal ~mportrisk assessment

557

probability a that we would fail to detect the infection (we have no positive tests) were it there. a is
usually in the order of 0.1-5 %.

22.4.1 A perfect test with a very large group
A perfect test means that the test has Se and Sp = 1. If we know that the prevalence would be p, should
the group be infected, then the probability of failing to detect the infection in sampling s animals is
given by the binomial probability

where s is the number of animals tested. Rearranging this formula gives

For example, if p = 30 % and a = 1 %, then

so we would have to test at least 13 animals to have 99 % probability of detecting any infection that
was there.

22.4.2 A perfect test with a smaller group
Assuming we know the prevalence would be p, should the group be infected, and the group size is M,
there will be pM infected animals. Then the probability of failing to detect the infection is given by the
hypergeometric probability p(0):

This formula is hard to work out for large M and s. Section 111.10 shows that a very good approximation is given by

where s is again the number of animals tested. The formula is too complicated to rearrange for s, but it
can still be solved graphically. Figure 22.13 illustrates a spreadsheet where the number of tested animals
is determined to be 14 when a = 5 %, p = 20 % and M = 1000. The formula can also be solved using
a spreadsheet optimiser like Evolver, as shown at the bottom of the worksheet.

558

R~skAnalysis

Figure 22.13

Interpolating model for Section 22.4.2.

A more convenient approximation to p(0) is given by Cannon and Roe (1982), by expanding the
factorials of p(0) and taking the middle value of the numerator and denominator, which gives

Again, this formula can be used graphically to find the required level of testing.

22.4.3 An imperfect test with a very large group
If we know that the prevalence would be p , should the group be infected, and that the test has sensitivity
Se and specificity Sp, then the probability of observing zero positives in a test of size s is the combined
probability that all infected animals test negative falsely (with probability 1 - Se) and all uninfected
animals test negative correctly (with probability Sp). Thus, we have

from the binomial theorem (Section 6.3.4). Rearranging, we get

So, for example, if we have a test with Se = 0.9 and Sp = 0.95, and are fairly certain that the
prevalence would be 0.2, the number of animals we would have to test to have a 90% probability of

Chapter 22 An~malimport risk assessment

559

detecting infection in the group is calculated by

So we would have to test at least 10 animals.

22.4.4 An imperfect test with a smaller group
This problem requires that one use a hypergeometric sample with the probabilities of getting negative
results from the previous problem, which gives

This equation is not easily rewritten to give a value of s for a given probability a , but s can still
be reasonably easily determined by setting up a spreadsheet, using Stirling's approximation for large
factorials and using the linear solver to get the required value of a. ModelRisk has several summation
functions that are very helpful for these types of calculation: VosejSum (see below), VosejkSum (performs a two-dimensional sum) and VosejSumInf (calculates the sum of an infinite series providing it
converges). For example, the equation above could be written with ModelRisk as
= VosejSum("VoseHypergeoProb(j, j , D, M, 0)

* ((1 - Se)" j ) * (SpA(s - j)", 0, s)

The VosejSum function sums the expression within " " over integer values of j between 0 and s.
The ModelRisk VoseHypergeoProb function in the expression has no problems with large factorials and,
unlike Excel's HYPGEOMDIST, returns correct values across the parameter ranges, including returning
a zero probability when a scenario is not possible instead of the #NUM! that HYPGEODIST returns.
Now we can use Excel's Solver to vary s until the above formula returns the required value for a.

22.5 Miscellaneous Animal Health and Food Safety Problems
Problem 22.10: I trap 100 Canada geese. Surveillance data show that the population has a 3 %
prevalence of foaming beak disease (FBD). How many will have FBD? Canada geese also suffer
from irritable feet syndrome (IFS), with a prevalence of 1 %. How many of my 100 geese will have
at least one of these diseases? How many will have both?
Solution: The number of geese with FBD can be modelled as Binomial(100, 3 %). The probability that
a goose will have at least one of FBD and IFS is 1 - (1 - 3 %)(I - 1 %) = 3.97 %. The number of
geese with at least one disease is therefore Binomial(100, 3.97 %).
The probability that a goose will have both FBD and IFS is 3 % * 1 % = 0.03 %. The number of
geese with both diseases is therefore Binomial(100, 0.03 %). These last two answers assume that the
probabilities of a bird having IFS and of FBD are constant, whether they have the other disease or not. +

560

Risk Analysis

Problem 22.11: Disease Y had a 25 % prevalence among the 120 bulls randomly tested from a
country's population. Disease Y also had a 58 % prevalence among the 200 cows tested from
that country's population. According to extensive experiments, a foetus has a 36 % probability of
contracting the disease from an infected parent. If 100 foetuses are imported from this country
and if it is assumed that each foetus has different parents, what is the distribution of the number
of infected foetuses that will be imported?
Solution: Uncertainty about prevalence among bulls p~ can be estimated as: Beta(31, 91). Uncertainty
about prevalence among cows p c can be estimated as: Beta(ll7, 85). One of four scenarios are possible
with the parents: neither parent has the disease either one has or both have. They have the following
probabilities:
Neither:
(1 - pB)(l - pc)
Bull only: p B ( l - pc)
Cow only: (1 - ps)pc
Both:
PBPC

Let p be the probability of infection of a foetus if one parent is infected (= 36 %). Then
Probability of foetus infection given neither parent is infected: 0
Probability of foetus infection given one parent is infected: p
Probability of foetus infection given both parents are infected: 1 - (1 - p)2
Thus, the probability that a foetus is infected

and the number of infected foetuses is equal to Binomial(100, P(inf)).

+

Problem 22.12: A large vat contains 10 million litres of milk. The milk is known to be contaminated
with some virus, but the level of contamination is unknown. Fifty samples of 1 litre are taken from
the vat and tested by an extremely reliable procedure. The test reports a positive on each of the
samples if there are one or more virus particles in each sample. The results therefore do not
distinguish how many virus particles there are in a positive sample. Estimate the concentration of
virus particles if there were seven positive tests.
It is known that one needs to consume some eight particles in a single dose of milk in order to
have any chance of being infected. It is also considered that the people consume, at most, 10litres
of milk in a single helping. What is the probability P lo that somebody consuming a 10 litre volume
of this milk will consume an infective dose?
Solution: We can use Bayesian inference to estimate the concentration of virus particles in the milk.
Let us assume that the virus particles are perfectly mixed within the milk. This is probably a reasonable
assumption as they are extremely small and will not settle to the bottom, but this does assume that there
is no clumping. The total sample volume taken from the milk is very small compared with the volume
in the vat, so we are safe to assume that our sampling has not materially altered the virus concentration
in the vat. This would not be the case, for instance, if the vat were very small, say 10 litres, and we had
sampled five of those litres with one positive result, since we could well have removed the only virus
particle in the entire vat. +

Chapter 22 Animal Import risk assessment

56 1

Method 1. We can now see that the number of virus particles in a sample of milk will be Poisson(ht)
distributed, where h is the mean concentration per litre in the milk and t is the sample size in litres.
If there were some clumping, we would probably use a Poisson distribution to describe the number of
clumps in a sample, and some other distribution to describe the number of virus particles in each clump.
If h is the concentration of virus particles per litre, the probability of having no virus particles in
a sample of 1 litre is given by the Poisson probability mass function for x = 0, i.e. p(0) = exp[-h].
The probability of at least one virus particle in a litre sample is then 1 - exp[-h]. The probability of
having s infected samples out of n is given by the binomial probability function, since each sample is
independent and has the same probability of being infected, i.e.

Using an uninformed prior p(h) = l l h and the above equation with n = 50 and s = 7 as the likelihood
function for h, we can construct the posterior distribution. The points on this curve can then be entered
into a relative distribution and used to calculate the probability Plo in the question using the Excel
function POISSON as follows:

Method 2. In method 1 we looked at the Poisson process first, then at the binomial process. Now
we start from the other end. We know that the 50 samples are all independent binomial trials and that
we had seven successes, where a success is defined as an infected sample. Then we can estimate the
probability of success p using the beta distribution:

From method 1 we also know that p equates to 1 - exp[-h], so h = - ln[l - p] = - ln[l - Beta
(8,44)] = - ln[Beta(44, a)]; the last identity occurs because switching the parameters for a beta distribution is equivalent to switching what we define to be successes and failures. In using the beta
distribution in this fashion, we are assuming a Uniform(0, 1) prior for p . It is interesting to look at what
this would equate to as a prior for h, i.e. the distribution of h when h = - ln[Uniform(O, I)]. It turns
out that this means h has an Expon(1) distribution or, equivalently, a Gamma(1, 1) distribution. The
reader can prove this by using the Jacobian transformation or, more simply, by making a comparison
with the cumulative distribution function of the exponential distribution. The prior in method 1 gives
n(h) cc l l h , whereas the prior for lambda in the second method is n(h) cc exp[-h]. These two priors
are quite different, illustrating some of the difficulties in determining an uninformed prior. +
A prior where n(8) cc 116 is very close to a Gamma(a, l l a ) distribution where a is very large. The
gamma distribution is the conjugate prior for the Poisson likelihood function. In estimating the Poisson
mean h number of events per period, using a Gamma(a, j3) prior and a Poisson likelihood function for S
observations in n periods, we get (see Section 8.3.3) a posterior distribution for h equal to a Garnrna(a +
S, j3/(1 pn)). Thus, using the Expon(1) = Garnma(1, 1) prior, one gets a posterior equal to

+

hlobservations = Gamma(S

+ 1, l l ( n + 1))

while using the n(8) cc 118 prior yields roughly
h lobservations = Gamma(S, 1In)

562

Risk Analys~s

The difference in these two equations shows that it does not take too large a set of data for the
importance of the form of the prior to be overpowered by the likelihood function (i.e. S 1 x S and
n I n).

+

+

Problem 22.13: 1000 eggs, each of 60m1, are mixed together. It is estimated that 100 of those
eggs have Salmonella. After mixing, 60ml of liquid egg is removed and consumed. If there are
100 Salmonella CFUIinfected egg, how many CFU will be consumed in that 60ml? What is the
probability that at least 1 CFU will be consumed? How much liquid egg will need to be consumed
to receive the minimum infectious dose of 12 CFU? [A CFU is a colony-forming unit - a surrogate
measure for the number of individual virus particles.]
Now, instead of a minimum infectious dose, we use the following dose-response model:

where x is the consumed dose. What is the probability of becoming infected from consuming a
60 ml volume of liquid egg?
Solution: One in every 10 eggs is infected, so the average Salmonella concentration will be 10
CFUl60ml. Taking one egg's worth of liquid from the mass of 1000 eggs is a small amount, and,
if the eggs are well mixed, we can assume that a sample will follow a Poisson distribution of CFU with
the above average concentration. The number of CFU in one 60 ml sample will then take a Poisson(l0)
distribution. The probability of at least 1 CFU in that volume = 1 - p(0) = 1 - exp(- 10) = 99.996 %.
The amount of egg that needs to be consumed for a dose of 12 CFU is Gamma(l2,
Using the dose-response model, the probability of becoming infected P is the Poisson probability of
consuming x CFU multiplied by Pinf(x),summed over all possible values of x, i.e.

A).

which has no closed form but can be evaluated quickly by summing an array in a worksheet or using
ModelRisk:

where 1000 is easily large enough, or more correctly

which will sum the expression over j from 1 onwards in integer steps until a value has been reached
with a precision of 0.000000001.
The answer is 83.679 %. +

Problem 22.14: Over the last 20 years there has been an average of four outbreaks of disease Z
in rural swine herds owing to contact with feral pigs. If an outbreak usually infects Normal(100,
30) pigs and if an individual pig is worth $Normal(l20, 22), calculate the total cost of outbreaks
for the next 5 years.
Assuming there is a 5 % chance that contact of a rural swine herd with domestic pigs will result
in an outbreak of disease Z , estimate how many actual contacts occur each year given that the
average number of outbreaks per year is 5.

Chapter 22 Animal import r~skassessment

563

Solution: We can assume that each outbreak is independent of every other, in which case the outbreaks
follow a Poisson process. With an average of four outbreaks per year, and assuming the rate of outbreaks
is constant over the last 20years, we can model the outbreaks in the next 5 years as Poisson(4 x 5) =
Poisson(20).
The number of outbreaks is a variable, so we have to add up a sum of a varying number of NormaI(100,
30) distributions to get the number of infected pigs. Then we have to add up a varying number of
Normal(l20, 22) distributions, depending on the number of infected pigs, to get the total cost over the
next 5 years of these outbreaks. One might think the solution would be

Total cost = Poisson(20) x Norma1(100,30) x Normal(l20, 22)
The answer seems at first glance to be quite intuitive, but fails on further inspection. Imagine that
the Poisson distribution in a Monte Carlo simulation produces a value of 25, and the Normal(100, 30)
distribution produces a value of 160. That is saying that, on average, the 25 outbreaks produce 160
infected pigs each. The 160 value is two standard deviations above the mean, and the probability of
the 25 outbreaks all taking such high values is extremely small. We have forgotten to recognise that
each of the 25 distributions (of the number of infected pigs in each of the 25 outbreaks) is independent.
However, this can easily be accounted for by using the central limit theorem (see Section 6.3.3). The
correct approach is shown in the spreadsheet of Figure 22.14. Figure 22.15 shows the difference between
the results for the correct approach and the incorrect formula shown above. One can see that the incorrect
approach, failing to recognise independence between the number of infected pigs in each outbreak and
between the value of each infected pig, produces an exaggerated spread for the distribution of possible
cost.
Note that it is a simple matter to use the central limit theorem in this problem, because the theorem
works exactly when summing any number of normal distributions. However, if the number of infected
pigs in an outbreak were Lognormal(70, 50) - a fairly skewed distribution - we would need to be
adding together some 30 or so of these distributions to get a good approximation to a normal for
the sum. Since the Poisson(20) distribution has a small probability of generating values above 30,
we may be uncomfortable with using the shortcut of the central limit theorem. The spreadsheet of
Figure 22.16 revisits the problem, using the lognormal distribution, where we are adding up a varying
number of lognormal distributions depending on the value generated by the Poisson distribution. The
VoseAggregateMC function is equivalent to the array model (see Sections 11.1 and 11.2.2).
In solving this problem, we have assumed exact knowledge of each parameter. However, undoubtedly
there will be some uncertainty associated with these parameters, and it may be worth investigating
A

1
2
24
5

1

B

I

C

l

D

l

Number of outbreaks in 5 years
Pigs infected
Cost

E

I

F

14
1,377
166.917

6
Formulae table

7
8
1
10

F2
F3
F4 (Outout)

= VosePoisson(4*5)

= ROUND(VoseNormal(l00*E2,30*SQRT(E2)),0)
= VoseNormal(l20*E3.22*SQRT(E3))

11

Figure 22.14 Spreadsheet model for Problem 22.14.

I G

564

Risk Analysis

_ _-_- .- ....

1.

.-a

>r
+.I

a-

3
0

0.8

--

0.7 -.

$

0.6 --

.-

0.5 --

9
E7
=I

0.2
0.3
0.4 ----

+.I

I[:;

0.9 --

......Correct
Incorrectformula
formula

0.1 -.
0

8

0

--50000 100000150000 200000 250000 300000350000400000450000500000

Total cost of outbreaks
Distribution of difference between correct and incorrect approaches for Problem 22.14.

Figure 22.15
A

1
2

1

B

C

I

l ~ u m b eof
r outbreaks in 5 years

1
21

3
Outbreak

D

E

F

IG.

1

Pi s in outbreak
53.79
66.77

7
4

50

56
57
58

Pigs infected
Cost

1,696
203,870

3%

60
61
62

63
65

D2
C5:C55
c57
C58 (output)

Formulae table
=VosePoisson(4*5)
=IF(B5~$D$2,0,VoseLognormal(70,50))
=ROUND(SUM(C5:C55),0)
=VoseNormal(l20*C57,22*SQRT(C57))

C58 (alternative)

=VoseAggregateMC(VosePoisson(4*5),VoseLognorma10bject(70,50))

66

Figure 22.16

Spreadsheet model for Problem 22.14 using lognormal distributions.

exactly how well specified these parameters really are. The only parameter for which we can quantify
the uncertainty in this problem is the mean number of outbreaks per year. To do this, we must still
assume that the rate of outbreaks is constant over 20years (which seems unlikely for most countries
owing to changes in international trade and veterinary practices, for example), which may be our biggest

Chapter 22 Animal rmport risk assessment

565

Figure 22.17 Spreadsheet model for the uncertainty in h for Problem 22.14.

source of uncertainty. That aside, the uncertainty about the mean outbreaks per year h can be readily
quantified using Bayes' theorem. We can use an uninformed prior:

and a likelihood function

since there were 80 outbreaks in the last 20years. This is given by the Excel function POISSON(80,
20h, 0). Figure 22.17 shows the spreadsheet calculation and resultant uncertainty distribution for h
which, unsurprisingly given the fairly large amount of data (see Section III.9.2), looks very close to
being normally distributed. This distribution can be used to create a second-order model, separating
uncertainty and variability, as described in Section 4.3.2. +
Problem 22.15: 100 turkeys, randomly selected from a population, were tested for infectious bursa1
disease (IBD). 17 were infected. Of these 17, six had infected kidneys. After freezing these infected
kidneys for 10days at -5"C, only half still had viable IBD virus particles.
You are importing turkey kidneys from this population at a rate of 1600lyear. What is your
estimate of probabilitylyear of importing at least one kidney with viable IBD virus? Note: it takes
12 days for the product to ship to your country and 2 days to clear customs. You, as the regulating
authority, may set importing restrictions (i.e. storage temperatures) for the importer.
Solution: I leave this as an exercise, but with a hint: you should model the fact that each turkey has
two kidneys, and that, if one kidney becomes infected, so will the other. In other words, the kidneys
are being produced and exported two by two. The problem leaves a number of questions open, in that
you need to make some assumptions. This is often the way with a risk analysis: once you get started,
you realise that some additional information could be useful. +

566

Risk Analysis

Problem 22.16: Over the last 6 years, the following numbers of outbreaks of disease XYZ in sheep
have occurred in your area: 12, 10, 4, 8, 15, 9. It is thought that about 80 % of these outbreaks
originate directly from attacks by wild dogs. It has further been estimated that contact with wild
dogs has around a 20-50 % chance of resulting in an infection.
Recent experiments have shown that putting llamas in with herds of sheep would halve the
number of attacks by dogs. (Llamas apparently have an in-built hatred of dogs and head butt
them on sight. Appearing to a dog to be a sheep on steroids, they also have an unpleasant habit
of spitting.) Estimate the number of outbreaks next year if llamas were put to use in all flocks
throughout your area.
Solution: This problem is a bit mean, in that I have added some irrelevant information. First of all, an
uncertainty distribution for the mean number of outbreaks per year h is determined in the same manner
as Problem 22.14. After llamas have been installed there will only be 50 % of the 80 % of outbreaks
due to dogs, plus the 20% that are not, i.e. the new mean number of outbreaks per year would be 0.6h
and the number of outbreaks next year will follow a Poisson(0.6h) distribution. +

Appendix I

Guide for lecturers
Risk analysis can be a fascinating topic to learn if the course focuses on the problem solving part, supported by
plenty of examples. I tend to keep the theory to a minimum in my lectures and try to find visual methods to explain
probability ideas rather than formulae. The type of person who is likely to be a successful risk analyst is more
usually attracted by the problem solving side of the work and the practical application of mathematics and logic
rather than the mathematics per se. I always start a course by getting people to say a little about themselves, and
this helps me find examples that have the greatest resonance with the particular audience. In my classes I set lots
of problems and ask people to work in pairs, for small problems, or groups of six or so for larger problems. This
makes a course a lot more dynamic, gives people a sense of achievement when they solve a problem and really
helps people understand better because they have to debate and defend or give up their ideas. I also keep a box of
little chocolates on my desk and throw them to course participants when they make a good comment, are first to
solve a problem, etc. It helps keep things fun and is surprisingly motivating.
At Vose Software we have an educational programme in which graduate and postgraduate students at accredited
universities can obtain copies of ModelRisk for a nominal charge through a bulk order via their university. The software is a full version but time limited to 1 year. More information is available at www.vosesoftware.com/academic
.htm. We have placed a lot of emphasis on user-friendly interfaces for as much of the functionality of ModelRisk
as possible (see Appendix 11), which makes it ideally suited as a teaching aid. Below I give some ideas on what
to include in a risk analysis course for various disciplines.
Risk management

A major issue faced by risk analysts is that the management to whom they report don't fully understand what a
risk analysis is, or how to use it. In my view it would be a great help if MBAs or similar business and public
administration courses offered a basic introduction to risk analysis. I suggest the following:
Chapters 1 to 5, to give a background on the purpose of risk analysis and how to use the results;
Chapter 7, to explain how a simulation works;
Chapter 17 on checking and validation.
Insurance and finance risk analysis

Insurance and finance risk modelling is probably the most technical area of risk. I suggest:
Chapter 5, to illustrate how results can be expressed and used;
Chapters 6, 8 to 13, 16 to 18 and 20 for an in-depth technical course.
Animal health

Animal health has quite an emphasis on modelling the probabilities of different pathways via which disease can
spread. I would focus on:
Chapters 1 to 5 , to set the scene;

568

Risk Analysis

Chapter 6 for the probability ideas;
Chapters 8, 9 and 17 for the technical aspects of modelling;
Chapter 22 for topic-specific ideas.
Business investment risk
Typical business investment problems involve deciding whether to invest in a new venture or expand on a proven
venture. The analyses are usually performed using discounted cashflows. I recommend:
Chapters 1 to 5, to set the scene;
Chapter 7 on how to run a model;
Chapters 9 and 10 for analysing data and fitting distributions;
Chapter 11 on sums of random variables - this is an area in which people make lots of mistakes;
Chapter 12 for forecasting time series;
Chapter 13 on correlation, particularly the subjective modelling of correlation;
Chapter 14 - perhaps the most important chapter in this area, since SMEs are often the source of most estimates
in an investment analysis;
Chapter 16 on optimisation, as this helps determine the best investment strategy, especially for things like
staged investment;
Chapter 18 for topic-specific ideas.
Inventory management and manufacturing
While not strictly risk analysis, we do a lot of this type of work because of the statistical analysis one can apply
to historical inventory and demand data and production profiles. I recommend:
Chapter 5, to explain how to present and interpret results;
Chapters 6 to 10, 12 and 13 for technical material;
Chapter 16 on optimisation;
Chapter 17 on model validation.
Microbial food safety
There has been an over-emphasis in the past on rather abstract and complex models of microbial food safety.
I recommend:
Chapters 1 to 5, to set the scene and give some tools for the risk analyst to express the level of confidence one
should have in the model's outputs;
Chapters 8 to 11 and 17 for the technical aspects of modelling;
Chapter 15 for causal thinking;
Chapter 21 for topic-specific ideas.

Appendix 11

About ModelRisk
~ o d e l ~ i s k ' is
" a comprehensive risk analysis software tool with many unique features that make risk modelling
more accessible, easier and more transparent for novice and advanced users alike. The design concepts and features
in ModelRisk are the result of struggling with available risk analysis software to solve our clients' problems. It is
the tool that realises our view on how good and defensible risk analyses should be performed. It is our answer to
those who argue that complex models are too vulnerable to errors and simple ones are a better alternative. With
ModelRisk, complex models can be built as easily as simple ones, and in almost no time!
The main idea behind creating such a tool was to make the most advanced risk analysis techniques accessible to
a wide range of users including those who do not have the programming capabilities required to build a complex
model. We put a lot of effort into making the software user friendly and leaving as much of the complex mathematics
as possible "behind the scenes" so that a modeller can be guided through the chosen method and be sure s h e is
using it correctly. We think this feature alone should save about 70 % of the modeller's time, as it is a well-known
fact that reviewing and debugging a model is more time consuming than developing it.
In search for an ideal tool for risk analysis modelling, we asked ourselves a few questions:
What methods and theories are widely known in the industry?
How are they currently used and what are the problems with their implementation?
How can we make a tool both simple and intuitive to use and also flexible enough to make it possible to model
any complex customised situation?
How can we make it easy to present and explain a complex model to a decision-maker?
ModelRisk was the answer to all of these questions, providing the modeller with the following features:
It is based on the most recent achievements in industry-specific risk analysis theory.
It is Excel-based: a widely used Excel spreadsheet environment provides the best foundation for making risk
analysis techniques available to a wide range of users.
It is flexible enough to model any complex customised business case.
It includes many building blocks (tools) that allow the creation of complex models within minutes.
It offers immediate help and thorough explanations of all tools.
It gives warnings of errors and suggestions for corrections during model development in order greatly to increase
the speed of modelling and debugging.
It provides excellent precision of outcomes that is comparable with leading non-Excel statistical packages.
It provides a visual interface that is great both for a self-check when building a model and for presenting the
thinking and results to a reviewer.
If needed, it can be used outside Excel for integration with other applications.

570

Risk Analysis

ModelP,isk and Excel
ModelRisk is an add-in to Microsoft Excel, and its tools fully comply with Excel function rules, which makes
its use intuitive for those who are familiar with the Excel spreadsheet environment. Even though ModelRisk
uses the English language for its tools, it can work seamlessly on any language platform, including different
language versions of Windows, Excel and various simulation tools. ModelRisk can be called from any programming
environment that can make direct calls to dynamic link libraries (DLLs), including VBA, VB, C++ and others.
ModelRisk runs seamlessly in Excel with any Monte Carlo spreadsheet simulation package. This means that you
can combine ModelRisk with Crystal Ball, @RISK or any other Monte Carlo simulation Excel add-in, using these
add-ins to control how you run simulations and present the results, as well as making full use of more sophisticated
features like sensitivity analysis and optimisation.
ModelRisk is offered as an industry-specific package of analysis and modelling tools, and the first version has
been developed for insurance and finance. The full list of existing and to-be-developed ModelRisk versions for
other industries is reachable from www.vosesoftware.com.
Distribution functions
ModelRisk has more than 65 univariate distributions and the ability to calculate the probability density (or
mass), cumulative probability and percentile for each of them. The functions take the following format (using
a Normal@, a ) distribution as an example):
VoseNormal(p, a , U ) calculates the Uth percentile;
VoseNormalProb({x), p , a , 0) calculates the joint probability of observing the array of values {x];
VoseNormalProb({x), p , a , 1) calculates the joint cumulative probability of observing {x].
All probability calculations can also be done in loglo space, as joint probability calculations often result in
numbers that are too small to be supported by Excel. ModelRisk then performs the internal calculations in log
space for greatest precision. For example

U-parameter functions

Functions with a U-parameter (e.g. VoseNormal(p, a , U)) can also be used to generate random values from the
distributions using the inversion method (Section 4.4.1). Since the U-parameter represents a distribution's Uth
percentile (0- 100 %), making U a random sample from a Uniform(0, 1) distribution will give a valid sample from
a valid Normal(p, a ) distribution. For example
VoseNormal(p, a , RAND())
using Excel
VoseNonnal(p, a , RiskUniform(0, 1)) using @RISK
VoseNormal(p, a, CB.Uniform(0, 1)) using Crystal Ball
The U-parameter in ModelRisk functions is always optional, and if omitted the function will return a random
sample from the distribution by internally sampling from a Uniform(0, 1) using the Mersenne twister random
number generator.1
The Mersenne twister is a pseudorandom number generator developed in 1997 by Makoto Matsumoto

tbf)

(te&g ) and Takuji

Nirhimura (fi$q
that is based on a matrix linear recurrence over a finite binary field F 2 It provides fast generation of very
high-quality pseudorandom numbers, and was designed specifically to rectify many of the flaws found in older algorithms.

Appendix ll About ModelRisk

571

Because all ModelRisk univariate distributions have a consistent format and a U-parameter, it is very easy to
) . ModelRisk
correlate all of them using five different correlation methods available in ModelRisk ( ~ o p u l a s ~The
copulas offer a variety of correlation patterns and provide a far better control over the correlation than the more
usual rank order correlation. They can also be fitted to data and compared statistically. A k-dimension copula
function returns k random samples from a Uniform(0, 1) distribution, which are correlated according to a certain
copula pattern. Thus, if the copula-generated values are being used as U-parameters in ModelRisk distributions,
those distributions will be correlated. For example
A1:Bl =VoseCopulaBiClayton(lO, 1)
A2: =VoseLognormal(3, 1, A l )
B2: =VoseNormal(O, 1, B 1)

A two-cell array function generating values from a Clayton(l0)
copula
A Lognormal(3, 1) distribution with the U-parameter referenced to
the first copula value
A Normal(0, 1) distribution with the U-parameter referenced to the
second copula value

A scatter plot of the values generated by the Clayton copula looks like this:

The correlation between the two parent (normal and lognormal) distributions will take the following pattern:
In statistics, a copula is a multivariate joint distribution defined on the n-dimensional unit cube [O, 11" such that every marginal
distribution is uniform on the interval [0, I].

572

Risk Analysis

As shown above, Crystal Ball, @RISK, etc., can also provide the random number generator for ModelRisk
distributions. This feature allows you to take advantage of Latin hypercube sampling (LHS) from ModelRisk
distributions and other features like targeted sampling.
Using Crystal Ball or @RISK random number generators as engines for sampling from ModelRisk distributions
also allows integration of ModelRisk simulation output statistics into Crystal Ball or @RISK output interfaces as
well as advanced tools like sensitivity analysis. In other words, simulation with ModelRisk distributions is as easy
as with native Crystal Ball or @RISK distributions.

Object functions
ModelRisk offers a unique approach to defining and manipulating random variables as objects, allowing unprecedented flexibility in modelling complex industry-related issues. For each univariate distribution, ModelRisk has an
object function that takes the form

Modelling distributions as objects is a new concept in spreadsheet programming. It helps to overcome the
limitations of Excel and adds flexibility that is available in high-end statistical packages. If the object function is
put directly into a cell, it returns a text string, for example
"VoseNormal(Mu, Sigma)"
However, for ModelRisk this cell now represents a distribution, which can be used as a building block in many
other tools. The user can use the reference to the object function to calculate a statistic or generate random numbers
from the object's distribution. For example, if we write

A l : =VoseNormalObject(O, 1) the object function

Appendix II About ModelRisk

173

then the following formulae will access the Normal(0, 1) distribution defined in cell A l :
A2:
A3:
A4:
A5:
A6:
A7:
A8:
A9:

=VoseSimulate(Al)
=VoseSimulate(Al ,0.7)
=VoseProb(3, A 1,O)
=VoseProb(3, A l , 1)
=VoseMean(Al)
=VoseVariance(Al)
=VoseSkewness(Al)
=VoseKurtosis(Al)

takes a random sample from the Normal(0, 1)
calculates a 70 %th percentile of the Normal(0, 1)
calculates the density of Normal(0, 1) at x = 3
calculates the cumulative density of Normal(0, 1) at x = 3
returns the mean of the Normal(0, 1)
returns the variance of Normal(0, 1)
returns the skewness of Normal(0, 1)
returns the kurtosis of Normal(0, 1)

The object functions are particularly useful in modelling combinations of distributions, for example

models a splice of two distributions. In the figure below, a Gamma(3, 0.8) on the left is being spliced onto a shifted
Pareto2(4, 6) on the right at a splice point of 3. The figure shows a typical ModelRisk interface: with this constructed
distribution one can select to insert a function into Excel that is an object, that simulates, that calculates a probability
density (the option shown), that calculates a cumulative probability or that provides the inversion function.

Objects are also used in modelling many other tools, such as aggregate distributions, for example

This function uses the fast Fourier transformation method to construct the aggregate loss distribution, with the
frequency of claims distributed as Poisson(50) and the severity of each claim as Lognormal(l0, 5):

574

Risk Analysis

As mentioned earlier, the object functions can be used directly to calculate the moments of any object without
performing simulation, for example

Menu
Most of the ModelRisk tools can be accessed through the menu in Excel's menu bar:

Appendix II About ModelRisk

575

The View Function tool on top of the menu brings up the ModelRisk Formula View window which is almost the
same as Excel's Formula Bar but recognises ModelRisk functions and attaches hyperlinks to their corresponding
interfaces:

The Formula View bar always stays on top so that the user can quickly browse through all ModelRisk tools in
the current cell.
ModelRisk distributions are sorted into several industry-related categories to help the user identify the right
distribution:

The Distribution category also has the Splice tool, which we've seen before, and the Combined tool, which
allows the modelling of a combined distribution of several subjective opinions:

576

Risk Analysis

The Combined tool is typical of the ModelRisk approach. In risk analysis we often use subjective estimates of
variables in our model, preferably having more than one expert providing an estimate for each variable. These
estimates will not completely match, so we need to represent the resultant combined distribution correctly. The
Combined tool does this automatically, allows you to weight each opinion and returns plots of the resultant
distribution and statistics. The combined distribution can also be turned into an object and used as a building block
for the other ModelRisk tools.
Risk events

A risk event calculation is another tool we devised in response to a genuine practical need. A risk event is an event
that has a certain probability of occurring and, if it did occur, would have an impact following some distribution.
In risk analysis modelling, we normally require two distributions to model this: a Bernoulli(p) multiplied by the
impact distribution. The problem is that, in having two distributions, we cannot do a correct sensitivity analysis or
perform other calculations like determining the variable's moments or percentiles. The Risk Event tool allows the
modelling of a risk event as a single distribution:

and, again, the risk event distribution can be converted into an object:

=VoseRiskEventObject(0.3, VoseLognormalObject(5,2))
Copulas
ModelRisk has an interface for the bi- and multivariate copula tools showing scatter plots between the correlated
variables:

Appendix II About ModelRisk

1- ccpukr pa-

Eandcrl Rmk Order c m i a n t
31

I

1

ModelRisk can fit the different copulas to the data and has another interface window for that:

577

The screenshot above shows that the bivariate data (multivariate is also possible) are taken from location
$A$3:$B$219 and shown as red (light) dots. ModelRisk estimates the fitted Clayton copula parameter alpha to
be 14.12, and we get a visual verification of the fit by overlaying sampled values from Clayton(l4.12) on the
same chart as blue (dark) dots. Moreover, we can add uncertainty to our fitted parameter estimate by checking the
corresponding option, and the sampled values in blue are now samples from the Clayton copula with an alpha that
is an uncertainty distribution.
ModelRisk has some unique copula features. Bivariate copulas can be rotated to "pinch" in any quadrant, and
we offer a multivariate empirical copula that will match to any observed pattern.
Aggregate modelling

ModelRisk has a large selection of aggregate modelling tools using sophisticated recursive and fast Fourier transformation (FFT) methods. The Panjer, de Pril and FFT tools allow the construction of an aggregate claim distribution
based on the distributions of the frequency of claims and severity of each claim. It is highly laborious, and sometimes simply impossible, to do such calculations manually in a spreadsheet, while in ModelRisk it is only a one-cell
expression. The Aggregate interface shows the frequency, severity and constructed aggregate plots, as well as a
comparison of the constructed distribution's moments with their theoretical values, where they are available, so
that the modeller can see how good the aggregate approximation is:

You can also fit a number of distributions to the calculated aggregate model by matching moments at the click of a
button, which overlays the fitted distribution and compares statistics. The Aggregate pack of tools also allows modelling of the aggregate distribution of multiple frequency:severity aggregates using the FFT and brute-force (Monte
Carlo) methods. An example of the latter is shown in the screenshot below, where the two risks have correlated
frequency distributions equal to Poisson(50) and P6lya(lO, I), and corresponding severity distributions modelled
as Lognormal(l0, 5) and Lognormal(l0, 2). The level of correlation between the two frequency distributions is
described by a normal copula with a correlation parameter of 0.9:

Append~xI1 About ModelR~sk 5 7 9

Time series
ModelRisk has a set of time series modelling functions, including variations of geometric Brownian motion (GBM)
models with mean reversion, jump diffusion or both, seasonalised GBM, etc. Common financial time series are
included, such as AR, MA, ARMA, ARCH, GARCH, APARCH, EGARCH and continuous-time Markov chain:

580

Risk Analysis

You can view a time series with as many possible paths as you wish and generate new sets of paths on screen,
giving you a visual confirmation that the series is performing as you want. In addition to the ability to simulate
the time series in a spreadsheet, ModelRisk has the tools to fit all of its time series to the data:

The above screen capture shows the original time series data on the left and the forecast sample from the fitted
GBM model on the right. The Time Series Fit tool allows the inputting of the array of past time stamps (i.e. for
certain time series it is not necessary that all observations are made at regular intervals, allowing, for example,
missing observations) with which the historical data were collected and the future time stamps to specify the exact
points of time for which the prediction values need to be modelled. Much like with the copula fit tool, the time
series fit functions have an uncertainty parameter that switches the uncertainty about the fitted parameters on and
o f f With the uncertainty parameter of& the fit tool returns forecasts using the maximum likely estimates (MLEs).
Other features
Among many other tools, ModelRisk has the following highly technical features:
distribution fitting of censored or truncated data and modelling of the uncertainty about the fitted parameters;
goodness of fit of copulas, time series and distributions using three information criteria (Section 10.3.4);
insurance fund ruin and depletion models, which model the inflows and outflows into the insurance fund and
calculate the distributions of the chance of the ruin, the ruin time, etc.;
a powerful, unique approach to extreme-value modelling, making it possible, for example, to calculate directly
the probability that the largest of a million claims following a certain distribution will not exceed some value
X with 95 % confidence;
direct determination of insurance premiums using standard actuarial methods;
one-click statistical analysis of data, including bootstrapping;

Appendix ll About ModelRisk

58 1

multivariate stochastic dominance analysis;
determination of a portfolio's efficient frontier;
Wilkie time series models, which allow simulation of Wilkie time series in a spreadsheet;
and many more.
Help file
In addition to all of the modelling tools described above, ModelRisk comes with a specialised version of
~ o d e l ~ s s i s t for
r " insurance and finance that shows how to use ModelRisk with hundreds of topics explaining
any relevant theory, example ExcelNodelRisk model solutions, videos, a search engine and much more!
The help file is fully integrated into Excel and the ModelRisk interfaces, providing links to the help file from
all the places you most need them.
Custom applications and macros

ModelRisk can also be used in any programming language that supports calls to DLLs, for example Visual Basic,
C++ and Delphi. ModelRisk has a COM-object called "ModelRisk Library", which is automatically registered in
a local system when ModelRisk is installed.
To access ModelRisk functions from Visual Basic, for example, one needs to register ModelRisk as a reference
for the VB project as shown in the screenshot below:

An example of a VB routine is given below:
Sub TestMR ( )
Sample from the bounded Normal(0,l) distribution
Dim Sample As Double
Sample = ModelRisk.VoseNormal(0, 1, , VosePBounds(0.1, 0.9))

Calculate the combined probability of observing
the values 125, 112 and 94 from Poisson(100) distribution
Dim Prob as Double

Dim Values As Variant
Values = Array(125, 112, 94)
Prob = ModelRisk.VosePoissonProb(Values, 100, 0)

' Use Central Limit Theorem to sample from the

distribution of the sum of 120 Normal(25, 3) distributions
Dim CLT As Double
CLT = ModelRisk.VoseCLTSum(120, 25, 3)

' Print the output values to the Debug console

Debug.Print Sample, Prob, CLT
End Sub

ModelRisk's COM Object offers a limited number of ModelRisk tools, such as distributions. The full set of ModelRisk tools is available to programmers in the ModelRisk SDK, which allows integrating the full power of ModelRisk into custom, non-spreadsheet stand-alone applications. An example of using ModelRisk SDK in C++ is shown
in Exhibit 1. More information regarding the ModelRisk SDK is available at www.vosesoftware.com/ModelRisk
SDK.htm.
Vose Software can also develop stand-alone or integrated ModelRisk-based risk analysis applications for you.
We can create a user-friendly interface around the ModelRisk engine that addresses the client's requirements and
leverages the efficiency of the final product by using our extensive expertise in developing risk analysis models.
For more information regarding customised software development, please contact info@vosesoftware.com.
Exhibit I . Using ModelRisk SDK in C++
ModelRisk SDK offers C++ developers a direct access to the ModelRisk Function library. The best way of doing it is
using <> and <> for access to the Core library - <>.
The <> file is compatible with <>. The following steps need to be taken
to include ModelRisk library functions in your C++ project:
1 . In MS Visual Studio, create an empty C++ project (skip this step for existing projects).
2. In the menu <>, select <>.
3.
4.

At the bottom of the <> window, change the <>

setting to <>

Use the <> window explorer to change the current folder to the ModelRisk installation
folder: << . . .Program Files\Vose Software\ModelRiskSDK\>>.

5 . Using <>, select <> and <> and click the <> button
in the bottom-right corner of the <> window.

6. Now you can directly call all ModelRisk functions declared in <>.
Common rules for using C++ Vosecore functions (VCFs) and error handling
1.

All VCF types are <>. They return <> if the calculation finished without errors, or <> if
one or more options are invalid.

2. If a VCF returns <>, a developer can use VoseCoreError() to get an error string (it returns char* buffer
address).

Appendix II About ModelRisk

583

3. All VCFs return the result to one or more $rst argument(s). For example, if a VCF returns a single value (for
example, <>), the result will be returned to the first parameter of type <<&double>>. If
the VCF returns a set of values (for example, <>), then the result is placed in the first two
arguments of <> and <> types (the first argument is an array, the second its length). Some of
the VCFs return more than two outputs (such as VoseTimeGARCHFit - it calculates and returns four values
of type <>).
4. Memory allocation rule: for all VCFs that return arrays of values, the output array must be allocated by the
developer (statically or dynamically) and its length must be equal to the second argument.
Using distribution core functions
All VoseDistribution function declarations must be in the following format:

VoseDistribution-Core(doub1e rc,
typel argl [,type2 arg2. . .],
double *pU,
int *pBoundsMode,
double *pMin,
double *pMax,
double *pShiftValue

//the output
//distribution arguments
//pointer to the percentile value
//address of <> jag
//pointer to the minimum limit value
//pointer to the maximum limit value
//pointer to the shift value

1;
The parameters of VCF distributions are as follows.

rc
type1 arg 1[,type2 arg2 . . .I

Output value (double).
One or several distribution arguments.

*PU

A pointer to the percentile value, which must be in [0 . . . 11.
If this pointer = 0, the percentile will be randomly
generated using the inbuilt Mersenne twister random
number generator.
Address of <> flag - used for bounded
distributions.
If the first bit is 0, the *pMin is interpreted as a percentile,
otherwise as a value.
If the second bit is 0, the *pMax is interpreted as a percentile,
otherwise as a value.
A pointer to the minimum and maximum limits
A pointer to the shift value

*pBoundsMode

*pMin, *pMax
*pShiftValue

Example of using the Vosedistribution core function

void ExampleFunctionO
{

double x,U,Shift,max,min,BoundMode;

584

Risk Analys~s

/ / Example of a call to VoseNormal(O.1) distribution
if ( !VoseNormal-Core( x,mu,sigma,0,0,0,0,0)
){
printf ("%s",
VoseCoreError ( ) ) ;
}else{
)
x );
printf ("Normal( 0 , l=%.logt',

1
//Example of a call to VoseNonnal(O,l,O) distribution- withpercentile = 0
u=o.0;
if ( !VoseNormal-Core( x,mu,sigma,&U,O,O,O,O)){
printf ("%s",VoseCoreError
() ) ;
}else{
x) ;
printf ("Normal(O,l)=%.lOg",

//Example o f a call to VoseNormal(O,l,,VoseShift(lO)) distribution- withpercentile = 0;
if ( !VoseNormal-Core( x,mu,sigma,O,O,O,O,&Shift)){
printf ("%s",VoseCoreError
() ) ;
}else{
printf ("Normal(O,l)=%.10g",
x) ;

1
/ / Example of a call to VoseNonnal(O.1, ,VosePBounds (0.3,O. 8) ) distribution- with random
/ / generatedpercentile and bounded usingpercentiles of maximum and minimum 1imits
BoundMode = 3 ; / / set I-stand2-ndbits to = 1
min = 0.3;
max = 0.8;

if

(

!VoseNormal-Core( x,mu,sigma,O,&BoundMode,&min,&max,O)
){
printf ("%sM
,voseCoreError( ) ) ;

]else{
printf ("Normal( 0 , l=%.log",
)
x) ;

Appendix 111

A compendium of distributions
Compiled by Michael van Hauwermeiren

The precision of a risk analysis relies very heavily on the appropriate use of probability distributions to represent the
uncertainty and variability of the problem accurately. In my experience, inappropriate use of probability distributions
has proved to be a very common failure of risk analysis models. It stems, in part, from an inadequate understanding
of the theory behind probability distribution functions and, in part, from failing to appreciate the knock-on effects
of using inappropriate distributions. This appendix is intended to alleviate the misunderstanding by providing a
practical insight into the various types of probability distribution in common use.
I decided in this third edition to place the compendium of distributions in an appendix because they are used
in so many different places within the book. This appendix gives a very complete summary of the distributions
used in risk analysis, an explanation of where and why they are used, some representative plots and the most
useful descriptive formulae from a risk analysis viewpoint. The distributions are given in alphabetical order. The
list comprises all the distributions that we have ever used at Vose Consulting (and have therefore included in
ModelRisk), so I am pretty confident that you will find the one you are looking for. Distributions often have
several different names depending on the application, so if you don't find the distribution you are searching for
here, please refer to the index which may suggest an alternative name.
Most risk analysis and statistical software offer a wide variety of distributions, so the choice can be bewildering.
I have therefore started this appendix with a list of different applications and the distributions that you might find
most useful. Then I offer a little guide on how to read the probability equations that feature so prominently in this
appendix: people's eyes tend to glaze over when they see probability equations, but with a few simple rules you
will be able rapidly to "read" the relevant parts of an equation and ignore the rest, which can give you a much
more intuitive feel for a distribution's behaviour.

111.1 Discrete and Continuous Distributions
The most basic distinguishing property between probability distributions is whether they are continuous or discrete.

Ill. I. I Discrete distributions
A discrete distribution may take one of a set of identifiable values, each of which has a calculable probability of
occurrence. Discrete distributions are used to model parameters like the number of bridges a roading scheme may
need, the number of key personnel to be employed or the number of customers that will arrive at a service station
in a hour. Clearly, variables such as these can only take specific values: one cannot build half a bridge, employ
2.7 people or serve 13.6 customers.
The vertical scale of a relative frequency plot of a discrete distribution is the actual probability of occurrence,
sometimes called the probability mass. The sum of all these values must add up to 1.
Examples of discrete distributions are: binomial, geometric, hypergeometric, inverse hypergeometric, negative
binomial, Poisson and, of course, the generalised discrete distribution. Figure 111.1 illustrates a discrete distribution
modelling the number of footbridges that are to be built across a planned stretch of motorway. There is a 3 0 %

586

Risk Analysts

Number of overbridges

Figure 111.1 Example of a discrete variable.

chance that six bridges will be built, a 10 % chance that eight bridges will be built, etc. The sum of these probabilities
~10%+30%+30%+15%+10%+5%)mustequalunity.

111.1.2 Continuous distributions
A continuous distribution is used to represent a variable that can take any value within a defined range (domain).

For example, the height of an adult English male picked at random has a continuous distribution because the height
of a person is essentially infinitely divisible. We could measure his height to the nearest centimetre, millimetre,
tenth of a millimetre, etc. The scale can be repeatedly divided up, generating more and more possible values.
Properties like time, mass and distance that are infinitely divisible are modelled using continuous distributions. In
practice, we also use continuous distributions to model variables that are, in truth, discrete but where the gap between
allowable values is insignificant; for example, project cost (which is discrete with steps of one penny, one cent, etc.),
exchange rate (which is only quoted to a few significant figures), number of employees in a large organisation, etc.
The vertical scale of a relative frequency plot of an input continuous probability distribution is the probability
density. It does not represent the actual probability of the corresponding x-axis value since that probability is zero.
Instead, it represents the probability per x-axis unit of generating a value within a very small range around the
x-axis value.
In a continuous relative frequency distribution, the area under the curve must equal 1. This means that the
vertical scale must change according to the units used for the horizontal scale. For example, the probability in
Figure III.2(a) shows a theoretical distribution of the cost of a project using Normal(£4 200 000, £350 000). Since
this is a continuous distribution, the cost of the project being precisely £4M is zero. The vertical scale reads a value
(about one in a million). The x-axis units are £1, so this y-axis reading means that there is a one in a
of 9.7 x
million chance that the project cost will be £4M plus or minus 50p (a range o f f 1). By comparison, Figure III.2(b)
shows the same distribution but using million pounds as the scale, i.e. Normal(4.2, 0.35). The y-axis value at
x = f 4 M is 0.97, 1 million times the above value. This does not, however, mean that there is a 97 % chance of
being between £3.5M and £4.5M, because the probability density varies very considerably over this range. The
logic used in interpreting the 9.7 x lo-' value for Figure III.2(a) is an approximation that is valid there because
the probability density is essentially constant over that range (£4M f50p).

111.2 Bounded and Unbounded Distributions
A distribution that is confined to lie between two determined values is said to be bounded. Examples of bounded
distributions are: uniform - between minimum and maximum; triangular - between minimum and maximum;
beta - between 0 and 1; and binomial - between 0 and n .
A distribution that is unbounded theoretically extends from minus infinity to plus infinity. Examples are: normal,
logistic and extreme value.
A distribution that is constrained at either end is said to be partially bounded. Examples are: chi-square (>0),
exponential (>O), Pareto (>a), Poisson (LO) and Weibull (>O).

Appendix Ill A compendium of distributions

1E-06

,

2 0E+6
17-

587

(a) Project Cost (2): Notmal(4 200 000,350 000)

3OE+6

4OEc6

5 0E+6

6 0E+6

(b) Project Cost (fM) Normal(4 2,O 35)

Figure 111.2 Example of the effect on they scale of a probability density function by changing the x-scale
units: (a) units of El;(b) units of £1M.

Unbounded and partially bounded distributions may, at times, need to be constrained to remove the tail of the
distribution so that nonsensical values are avoided. For example, using a normal distribution to model sales volume
opens up the chance of generating a negative value. If the probability of generating a negative value is significant,
and we want to stick to using a normal distribution, we must constrain the model in some way to eliminate any
negative sales volume figure being generated.
Monte Carlo simulation software usually provide truncated distributions for this purpose as well as filtering
facilities. ModelRisk uses the XBounds or PBounds functions to bound a univariate distribution at either specified
values or percentiles respectively. One can also build logic into the model that rejects nonsensical values. For
example, using the IF function A2: = IF(A1 < 0, ERR(), 0) only allows values into cell A2 from cell A1 that are
>O and produces an error in cell A2 otherwise. However, if there are several distributions being bounded this way,
or you are using extreme bounds, you will lose a lot of the iterations in your simulation. If you are faced with the
problem of needing to constrain the tail of a distribution, it is also worth questioning whether you are using the
appropriate distribution in the first place.

111.3 Parametric and Non-Parametric Distributions
There is a very useful distinction to be made between model-based parametric and empirical non-parametric distributions. By "model-based" I mean a distribution whose shape is born of the mathematics describing a theoretical
problem. For example: an exponential distribution is a waiting time distribution whose function is the direct result
of assuming that there is a constant instantaneous probability of an event occurring; a lognormal distribution is
derived from assuming that ln[x] is normally distributed, etc.
By "empirical distribution" I mean a distribution whose mathematics is defined by the shape that is required. For
example: a triangular distribution is defined by its minimum, mode and maximum values; a histogram distribution
is defined by its range, the number of classes and the height of each class. The defining parameters for general
distributions are features of the graph shape. Empirical distributions include: cumulative, discrete, histogram,

588

R~skAnalys~s

relative, triangular and uniform. Chapters 10 and 14 discussed the reasons for using these distinctions. They fall
under the "empirical distribution" or non-parametnc class and are intuitively easy to understand, extremely flexible
and therefore very useful.
Model-based or parametric distributions require a greater knowledge of the underlying assumptions if they are
to be used properly. Without that knowledge, analysts may find it very difficult to justify the use of the chosen
distribution type and to gain peer confidence in their models. They will probably also find it difficult to make
alterations should more information become available.
I am a keen advocate of using non-parametric distributions. I believe that parametric distributions should only be
selected if: (a) the theory underpinning the distribution applies to the particular problem; (b) it is generally accepted
that a particular distribution has proven to be very accurate for modelling a specific variable without actually having
any theory to support the observation; (c) the distribution approximately fits the expert opinion being modelled and
the required level of accuracy is not very high; (d) one wishes to use a distribution that has a long tail extending
beyond the observed minimum or maximum. These issues are discussed in more detail in Chapter 10.

di

II

1

i

i
1

i

*I

i
i
I

I

111.4 Univariate and Multivariate Distributions

j

1
4

Univariate distributions describe a single parameter or variable and are used to model a parameter or variable that
is not probabilistically linked to any other in the model. Multivariate distributions describe several parameters or
variables whose values are probabilistically linked in some way. In most cases, we create the probabilistic links
via one of several correlation methods. However, there are a few multivariate distributions that have specific, very
useful purposes and are therefore worth studying more.

111.5 Lists of Applications and the Most Useful Distributions
Bounded versus unbounded
The following tables organise distributions according to whether their limits are bounded. Italics indicate nonparametric distribution~.

Un~vanateD~stnbutlons

1

Appendix Ill A compendium of distrlbutlons

Lognormal (base E)
Lognormal (base B)
Pareto (first kind)
Pareto (second kind)
Pearson5
Pearson6
Rayleigh
Weibull
Left and right Beta
Beta4
bounded
Cumulative ascending
Cumulative descending
Histogram
JohnsonB
Kumaraswamy
Kumaraswamy4
Modified PERT
Ogive
PERT
Reciprocal
Relative
Split triangle
Triangle
Uniform

Bernoulli
Beta-binomial
Binomial
Discrete
Discrete uniform
Hypergeometric
Step uniform

Multivariate Distributions

Multivariate hypergeometrlc

589

Frequency distributions
The following distributions are often used to model events that are counted, like outbreaks, economic shocks,
machine failures, deaths, etc.:
Bernoulli

Special case of binomial with one individual who may convert.

Binomial

Used when there is a group of individuals (trials) that may convert (succeed). For example,
used in life insurance to answer how many policyholders will claim in a year.

Delaporte

Events occur randomly with a randomly varying risk level, the most flexible in modelling
frequency patterns.

Logarithmic

Peaks at 1, looks exponential.

I NegBin

I

Events occur randomly with a randomly varying risk level, more restrictive than the P6lya.

Poisson

Events occur randomly with constant risk level.

P6lya

Events occur randomly with a randomly varying risk level.

I

Risk impact

A risk is an event that may or may not occur, and the impact distribution describes the "cost" should it occur. For
most applications, a continuous distribution that is right skewed and left bounded is most applicable. For situations
like a foot and mouth disease outbreak, the number of sheep lost will of course be discrete, but such variables are
usually modelled with continuous distributions anyway (and you can use ROUND(. . . , 0), of course:
Bradford

Looks exponential with min and max bounds.

Burr

Appealing because of its flexibility of shape.

Dagum

Lognormal looking with two shape parameters for more control, and a scale parameter.

Exponential

Ski-slope shape defined by its mean.

Ogive

Used to construct a distribution directly from data.

Loggamma

Can have a very long right tail.

Loglogistic

Impact is a function of several variables that are either correlated or one dominates.

Lognormal

Impact is a function of several uncorrelated variables.

Pareto (two kinds)

Ski-slope shape with the longest tail, so often used for conservative modelling of the
extreme right tail.

Time or trials until . . .
Beta-geometric

Failures before one beta-binomial success.

Beta-negative binomial

Failures before

Erlang

Time until m Poisson counts.

Exponential

Time until one Poisson count.

Fatigue life

Time until gradual breakdown.

Geometric

Failures before one binomial success.

s

beta-binomial successes.

1

1

Appendix Ill A compendium of distributions

Time until a Poisson counts, but used more generally.

Gamma

Failures before s hypergeometric successes.

Inverse hypergeometric

/

Inverse gaussian

I

Lognormal

591

I Theoretical waiting time use is esoteric, but the distribution has some flexibility I
1

1

through its varameters.

1 Time until an event that is the product of many variables. Used quite generally.

Negative binomial

Failures before s binomial successes.

Negative multinomial

Failures before s multinomial successes.

Rayleigh

A special case of the Weibull. Also distance to nearest, Poisson-distributed,
neighbour.

Weibull

Time until an event where the instantaneous likelihood of the event occurring
changes (usually increases) over time. Used a great deal in reliability
engineering.

Variations in a financial market

We used to be happy to assume that random variations in a stock's return, an interest rate, were normally distributed.
Normal distributions made the equations easier. Financial analysts now use simulation more so have become a bit
more adventurous:
Cauchy

I Extreme value (max. min)

An extreme distribution a little like a normal but with infinite
variance.

1

I Generalised error (aka GED, error) /

I

Laplace

I Defined by mean and variance like the normal, but takes a tent
/ shape. Favoured because it gives longer tails.

LCvy

1

~ozistic

Very flexible distribution that will morph between a uniform, (nearly)
a normal, Laplace, etc.

Appealing because it belongs to the "stable" family of distributions,
gives fatter tails than a normal.

I

Assumes that the market is randomly affected by very many
multiplicative random elements.

Normal

Assumes that the market is randomly affected by very many additive
random elements.

I Used to model the occurrence of iumvs in the market.

I

A little like a normal, but with fatter tails, tending towards Cauchy
tails.
-

Student

I

1
I

Like a normal but more ~ e a k e d .

Lognormal

I ~oisson

I

Used in place of lognormal when it has a right tail that is too heavy.

Inverse gaussian

1

Models the extreme movement, but trickv to use.

-

--

-

When rescaled and shifted, it is like the normal but with more
kurtosis when v is small.

592

Risk Analysis

How large something is
How much milk a cow will produce, how big will a sale be, how big a wave, etc. We'll often have data and want
to fit them to a distribution, but which one?
Bradford

Like a truncated Pareto. Used in advertising, but worth looking at.

Burr

Appealing because of its flexibility of shape.

Dagum

Flexible. Has been used to model aggregate fire losses.

Exponential

Ski-slope shape peaking at zero and defined by its mean.

Extreme value

Models extremes (min, max) of variables belonging to the exponential family
of distributions. Difficult to use. VoseLargest and VoseSmallest are much
more flexible and transvarent.

Generalised error (aka
GED, error)

Very flexible distribution that will morph between a uniform, (nearly) a
normal, Laplace, etc.

Hyperbolic-secant

Like a normal but with narrower shoulders, so used to fit to data where a
normal isn't auite right.

Inverse gaussian

Used in place of lognormal when it has a right tail that is too heavy.

Johnson bounded

Can have any combination of skewness and kurtosis, so pretty flexible at
fitting to data, but rarely used.

Loggamma

If the variable is the product of a number of exponentially distributed
variables, it may look loggamma distributed.

LogLaplace

The asymmetric logLaplace distribution takes a pretty strange shape, but has a
history of being fitted to particle size data and similar.

Loglogistic

Has a history of being fitted to data for a fair few financial variables.

Lognormal

See central limit theorem. Size is a function of the product of a number of
random variables. Classic example is oil reserves = area * thickness *
porosity * gas:oil ratio * recovery rate.

r
Normal

Pareto

I
/

I

See central limit theorem. Size is a function of the sum of a number of random
variables, e.g. a cow's milk yield may be a function of genetics and farm
care and mental well-being (it's been proven) and nutrition and . . .
Ski-slope shape with the longest tail, so often used for conservative modelling
of the extreme right tail, but generally fits the main body of data badly, so
consider splicing (see VoseSplice function, for example).

I
I

Wave heights, electromagnetic peak power or similar.
Student

I

Weibull

/

If the variance of a normal distribution is also a random variable (specifically
chi-square), the variable will take a Student distribution. So think about
something that should be roughly normal with constant mean but where the
standard deviation is not constant, e.g. errors in measurement with varying
aualitv of instrument or overator.
Much like the Ravleigh, including modelling wind speed.

I

Appendix Ill A compendium of distributions

593

Expert estimates

The following distributions are often used to model subject matter experts' estimates because they are intuitive,
easy to control and/or flexible:

I

Bernoulli

Used to model a risk event occurring or not.

Beta4

A min, max and two shape parameters. Can be reparameterised (i.e. the PERT
distribution). Shape parameters are difficult to use.

Bradford

I

Allows you to combine correctly several SME estimates for the same
parameter and weight them.

Cumulative (ascending
and descending)

Good when expert thinks of a series of "probability P of being below x".

Discrete

Specify several possible outcomes with weights for each.

Johnson bounded

VISIFIT software available that will match to expert estimate.

Modified PERT

1

PERT

Split triangle

1
Uniform

1

Allows you to construct your own shape.

/

1
Triangle

PERT distribution with extra control for spread.
A min, max and mode. Tends to place too little emphasis on tails if the
distribution is quite skewed.

Relative

I

1

Controllable distribution. similar to Beta4.

Kumaraswamv

1

A min, max and a ski-slope shape in between with controllable drop.

Combined

Defined by low, medium and high percentiles. Splices two triangular
distributions together. Intuitive.
A min, mode and max. Some software also offer low and high percentiles as
inputs. Tends to overemphasise tails.

I
1

A min and max. Useful to flag when SME (see Section 14.4) has very little
idea.

111.6 How to Read Probability Distribution Equations
The intention of this section is to help you better understand how to read and use the equations that describe
distributions. For each distribution (except those with outrageously complicated moment formulae) in this appendix
I give the following equations:
probability mass function (for discrete distributions);
probability density function (for continuous distributions);
cumulative distribution function (where available);
mean;
mode;
variance;
skewness;
kurtosis.

594

Risk Analvsis

There are many other distribution properties (e.g. moment-generating functions, raw moments), but they are of
little general use in risk analysis and would leave you facing yet more daunting pages of equations to wade through.

111.6.1 Location, scale and shape parameters
In this book, and in ModelRisk, we have parameterised distributions to reflect the most common usage, and where
there are two or more common parameterisations we have used the one that is most useful to model risk. So I
use, for example, mean and standard deviation a lot for consistency between distributions, or other parameters that
most readily connect to the stochastic process to which the distribution is most commonly applied. Another way to
describe parameters is to categorise them as location, scale and shape, which can disconnect the parameters from
their usual meaning but is sometimes helpful in understanding how a distribution will change with variation in the
parameter value.
A location parameter controls the position of the distribution on the x axis. It should therefore appear in the
same way in the equations for the mode and mean - two measures of location. So, if a location parameter increases
by 3 units, then the mean and mode should increase by 3 units. For example, the mean of a normal distribution
is also the mode, and can be called a location parameter. The same applies for the Laplace, for example. A lot
of distributions are extended by including a shift parameter (e.g. VoseShift), which has the effect of moving the
distribution along the x axis and is a location parameter.
A scale parameter controls the spread of the distribution on the x axis. Its square should therefore appear
in the equation for a distribution's variance. For example, j3 is the scale parameter for the gamma, Weibull and
logistic distributions, a for the normal and Laplace distributions, b for the ExtremeValueMax, ExtremeValueMin
and Rayleigh distributions, etc.
A shape parameter controls the shape (e.g. skewness, kurtosis) of the distribution. It will appear in the pdf in
a way that controls the manipulation of x in a non-linear fashion, usually as a coefficient of x. For example, the
Pareto distribution has the pdf
1
f(x)

xs+l

where 0 is a shape parameter, as it changes the functional form of the relationship between f (x) and x. Other
examples you can look at are v for a GED, Student and chi-square distribution, and a for a gamma distribution.
A distribution may sometimes have two shape parameters, e.g. a , and a2 for the beta distribution, and vl and v2
for the F distribution.
If there is no shape parameter, the distribution always takes the same shape (like the Cauchy, exponential,
extreme value, Laplace, logistic and normal).

111.6.2 Understanding distribution equations
Probability mass function (pmf) and probability density function (pdf)
The pmf or pdf is the most common equation used to define a distribution, for two reasons. The first is that it
gives the shape of the density (or mass) curve, which is the easiest way to recognise and review a distribution. The
second is that the pmf (or pdf) is always in a useful form, whereas the cdf frequently doesn't have a closed form
(meaning a simple algebraic identity rather than expressed as an integral or summation).
The pmfs must sum to 1, and the pdfs must integrate to 1, in order to obey the basic probability rule that the
sum of all probabilities equals 1. This means that a pmf or pdf equation has two parts: a function of x, the possible
value of the parameter; and a normalising part that normalises the distribution to sum to unity. For example, the
generalised error distribution pdf takes the (rather complicated) form

where

The part that varies with x is simply
1 x-p

~xP[-?

l
a
l
"
]

so we can write

The rest of Equation (III.l), i.e. K I P , is a normalising constant for a given set of parameters and ensures that
the area under the curve equals unity. Equation (111.2) is sufficient to define or recognise the distribution and
allows us to concentrate on how the distribution behaves with changes to the parameter values. In fact, probability
mathematicians frequently work with just the component that is a function of x, keeping in the back of their mind
that it will be normalised eventually.
For example, the (x - p ) part shows us that the distribution is shifted p along the x axis (a location parameter),
and the division by ,5 means that the distribution is rescaled by this factor (a scale parameter). The parameter v
changes the functional form of the distribution. For example, for v = 2
.fix)

exP

-

1 x-p
(B)2]

Compare that with the normal distribution density function

f (XI

exp

1 x-p

[-i (o)2]

So we can say that, when v = 2, the GED is normally distributed with mean p and standard deviation P . The
functional form (the part in x ) gives us sufficient information to say this, as we know that the multiplying constant
must adjust to keep the area under the curve equal to unity.
Similarly, for v = 1 we have

which is the density function for the Laplace distribution.
So we can say that, when v = 1, the GED takes a Laplace(p, ,5) distribution.
The same idea applies to discrete distributions. For example, the Logarithmic(6) distribution has the pmf

where

& is the normalising part because it turns out that log(x) can be expressed as an infinite series so that

596

Risk Analysis

Cumulative distribution function (cdf)
The cdf gives us the probability of being less than or equal to the variable value x . For discrete distributions this
is simply the sum of the pmf up to x , so reviewing its equation is not more informative than the pmf equation.
However, for continuous distributions the cdf can take a simpler form than the corresponding pdf. For example,
for a Weibull distribution
f (x) = a,-ffxa-

ex,

[- (p )"1 m

xff- exp

[-

($)*I

The latter is simpler to envisage.
Many cdfs have a component that involves the exponential function (e.g. Weibull, exponential, extreme value,
Laplace, logistic, Rayleigh). Exp(-co) = 0 and Exp(0) = 1, which is the range of F(x), so you'll often see
functions of the form
F(x) = exp(-g(x))

where g(x) is some function of x that goes from zero to infinity or infinity to zero monotonically (meaning always
increasing) with increasing x. For example, Equation (111.3) for the Weibull distribution shows us:
The value

scales x .

When x = 0, F ( x ) = 1 - 1 = 0, so the variable has a minimum of 0.
When x = co, F ( x ) = 1 - 0 = 1, so the variable has a maximum of co.
a makes the distribution shorter, because it "amplifies" x . For example (leaving
calculates 32 = 9, whereas if a = 4 it calculates 34 = 81.

B

= l), if a = 2 and x = 3 it

Mean p
The mean of a probability distribution is useful to know for several reasons:
It gives a sense of the location of the distribution.
The central limit theorem (CLT) uses the mean.
Knowing the equation of the mean can help us understand the distribution. For example, a Gamma(a, p )
distribution can be used to model the time to wait to observe a independent events that occur randomly in
time with a mean time to occurrence of p . It makes intuitive sense that, "on average", you need to wait, a * B
which is the mean of the distribution.
We sometimes want to approximate one distribution with another to make the mathematics easier. Knowing
the equations for the mean and variance can help us find a distribution with these same moments.
Because of CLT, the mean propagates through a model much more precisely than the mode or median. So, for
example, if you replaced a distribution in a simulation model with its mean, the output mean value will usually
be close to the output mean when the model includes that distribution. However, the same does not apply as
well by replacing a distribution with its median, and often much worse still if one uses the mode.
We can determine the mean and other moments of an aggregate distribution if we know the mean and other
moments of the frequency and severity distributions.
A distribution is often fitted to data by matching the data's mean and variance to the mean and variance
equations of the distribution - a technique known as the method of moments.

Appendix Ill A compendtum of distributtons

When the pdf of a distribution is of the form f (x) = g(x
the equation for the mean will be a linear function of z.

-

597

z), where g ( ) is any function and z is a fixed value,

Mode
The mode is the location of the peak of a distribution and is the most intuitive parameter to consider - the "most
likely value to occur".
If the mode has the same equation as the mean, it tells us the distribution is symmetric. If the mode is less than
the mean (e.g. for the gamma distribution, mode = ( a - 1)B and mean = aB) we know the distribution is right
skewed, if the mode is greater than the mean the distribution is left skewed. The mode is our "best guess", so it
can be informative to see how the mode varies with the distribution's parameters. For example, the Beta(a, B) has
a mode of

A Beta(s

+ 1, n

-

s

+ 1) distribution is often used to estimate a binomial probability where we have observed

s successes in n trials. This gives a mode of s l n : the fraction of our trials that were successes is our "best guess"

at the true (long-run) probability, which makes intuitive sense.

Variance V
The variance gives a measure of the spread of a distribution. I give equations for the variance rather than the mean
because it avoids having square-root signs all the time, and because probability mathematicians work in terms of
variance rather than standard deviation. However, it can be useful to take the square root of the variance equation
(i.e. the standard deviation a ) to help make more sense of it. For example, the Logistic(a, B) distribution has
variance

which shows us that B is a scaling parameter: the distribution's spread is proportional to B. Another example
Pareto(0, a ) distribution - has variance

-

the

which shows us that a is a scaling parameter.

Skewness 5
Skewness and kurtosis equations are not that important, so feel free to skip this bit. Skewness is the expected
or
component. You can tell whether a distrivalue of (x - p)3 divided by v3I2, so you'll often see a
bution is left or right skewed and when by looking at this equation, bearing in mind the possible values of each
parameter.

7

For example, the skewness equation for the negative binomial is

Since p lies on (0, 1) and s is a positive integer, the skewness is always positive.
The beta distribution has skewness

and, since a and /3 are > 0, this means because of the (/3 - a ) term it has negative skewness when a > B, positive
skewness when a < B and zero skewness when a = B .
An exponential distribution has a skewness of 2, which I find a helpful gauge against which to compare.
Kurtosis K

Kurtosis is the expected value of (x - p)4 divided by v 2 , SO you'll often see a + component.
(...I
The normal distribution has a kurtosis of 3, and that's what we usually compare against (the uniform distribution
has a kurtosis of 1.8, and the Laplace a kurtosis of 6, which are two fairly extreme points of reference).
The Poisson(k) distribution, for example, has a kurtosis of

which means, when taken together with the behaviour of other moments, the bigger the value of h , the closer the
distribution is to a normal.
The same story applies for the Student(u) distribution, for example, which has a kurtosis of

so the larger v , the closer the kurtosis is to 3.
The kurtosis of a Lognormal(y, a) distribution is z4

+ 2z3 + 3z2 - 3, where

What does that imply about when the lognormal will look normal?

A ~ ~ e n dIlll xA compendium of distributions

599

111.7 The Distributions
111.7.1 Univariate distributions

Bernoulli
VoseBernoulli(p)
Graphs
The Bernoulli distribution is a binomial distribution with n = 1. The Bernoulli distribution returns a 1 with probability p and a 0 otherwise.

Uses
The Bernoulli distribution, named after Swiss scientist Jakob Bernoulli, is very useful for modelling a risk event
that may or may not occur.
VoseBemoulli(0.2) * VoseLognormal(l2, 72) models a risk event with a probability of occurring of 20 % and
an impact, should it occur, equal to Lognomal(l2, 72).
Equations

600

Risk Analys~s

Beta

VoseBeta(cr, B )
Graphs

Uses

The beta distribution has two main uses:
As the description of uncertainty or random variation of a probability, fraction or prevalence.
As a useful distribution one can rescale and shift to create distributions with a wide range of shapes and over
any finite range. As such, it is sometimes used to model expert opinion, for example in the form of the PERT
distribution.
The beta distribution is the conjugate prior (meaning it has the same functional form, and is therefore also often
called the "convenience prior") to the binomial likelihood function in Bayesian inference, and, as such, is often
used to describe the uncertainty about a binomial probability, given a number of trials n have been made with a
number of recorded successes s. In this situation, w is set to the value (s x ) and B is set to (n - s y), where
Beta(x, y) is the prior.

+

+

Equations

Probability density function

1

Cumulative distribution function

f (XI =

(X - mitQLY-'(max -x)S-'

B(a, B) (max - min)afB-'
where B ( a . B ) is a beta function

1

No closed form

Parameter restriction

w > 0 , B > 0,min < max

Domain

min 5 x 5 max

Mean

min

+--_ (max - min)
a!

I

Appendix Ill A compendium of distributions

a - 1 (max-min)
min+a+B-2
min, max
max
does not uniauelv exist
Variance

ffB
(a +8l2(a + B

L

+ 1) (max

-

ifa>l,B>l
i f a < 1,B < 1
ifaIorifa=l,/3> I
ifa~l,Bl,B=l
ifa=I.B=1
min12

Skewness

Kurtosis

Beta4

VoseBeta4(a, p, min, max)
Graphs

Uses
See the beta distribution
Equations
Probability density function

(X - min)ol-l (max -x)B-'
B(a, B)(max - min)ff+fi-l
where B(a, B) is a beta function

f

(x) =

Cumulative distribution function

No closed form

Parameter restriction

a > 0 , B > 0 , min < max

Domain

min 5 x 5 max

60 1

602

Risk Analysis

Mean

+-a +a B (max - min)

min

,

min

+a +a p- 1- 2

(max-min)

min, max
rnin
max
does not uniauelv exist
Variance

aB
( a + @ )2 ( a + B

ifa > 1,

p

> 1

ifa lorifa=l,p> 1
ifa>1,/3<1orifa>l,p=I
if a = 1. B = 1

+ 1) (max - min12

Skewness
1

Kurtosis

(a

+ b + 1) ( 2 ( a + 812+ a B ( a + p - 6))
B

(a

+ B + 2 ) ( a + B +3)

Beta-Binomial
VoseBetaBinomial(n, a ,

B)

Graphs

A beta-binomial distribution returns a discrete value between 0 and n. Examples of a Beta-Binomial(30, 10, 7)
and a Beta-Binomial(20, 12, 10) distribution are given below.

Uses

The beta-binomial distribution is used to model the number of successes in n binomial trials (usually = Binomial(n,
p)) but when the probability of success p is also a random variable and can be adequately described by a beta
distribution.
The extreme flexibility of the shape of the beta distribution means that it is often a very fair representation of
the randomness of p .

Appendix Ill A compend~umof d~stributlons 6 0 3

The probability of success varies randomly, but in any one scenario that probability applies to all trials. For
example, you might consider using the beta-Binomial distribution to model:
the number of cars that crash in a race of n cars, where the predominant factor is not the skill of the individual
driver but the weather on the day;
the number of bottles of wine from a producer that are bad, where the predominant factor is not how each
bottle is treated but something to do with the batch as a whole;
the number of people who get ill at a wedding from n invited, all having a taste of the delicious soufflC,
unfortunately made with contaminated eggs, where their risk is dominated not by their individual immune
system, or the amount they eat, but by the level of contamination of the shared meal.
Comments

The beta-binomial distribution is always more spread than its best-fitting binomial distribution, because the beta
distribution adds extra randomness. Thus, when a binomial distribution does not match observations, because the
observations exhibit too much spread, a beta-binomial distribution is often used instead.
Equations

Probability mass function

f(x) =

Cumulative distribution function

F(x) =

n

"

I

Parameter restriction
Domain

1

1

+x)r(n+B x)r(a+B)
r(a +B +n)r(a)r(B)
r ( a + i ) r ( n+ B - i ) r ( a + B )

r(a

(:)

-

+

r ( a B +n)r(a)r(B)
a > O ; B > O ; n = { O , 1 , 2 ., . . I
x = { 0 , 1 , 2,..., n ]

i=o

n
ifa>l,Bl,B=l
does not uniquely exist if a = 1, B = 1
Variance
Skewness

(a

-

+ B + 2n)
(ff

Kurotosis

1

+B

/

(l+a+B)
+ 2 ) 1 naB(n + a + B )

I

604

Risk Analysis

Beta-Geometric
VoseBetaGeometric(a, ,6)

Graphs

Uses
The BetaGeometric(a, b) distribution models the number of failures that will occur in a binomial process before
the first success is observed and where the binomial probability p is itself a random variable taking a Beta(a, b)
distribution.

Equations

+ ~ ) r (+a

Cumulative distribution function
Parameter restriction

X)

f (x) = r ( a ) r ( a + p + x + l )
"
B r ( w + B ) r ( a + i)
F(x) =
r
( a ) r ( a+ B + i + 1 )
i=O

Probability mass function

I ff>O,B>O
I

Domain

x = (0, 1 , 2, . . .)

I

ff
-for

Mean
Mode

B-1

/3 > 1

10
I

Variance
Skewness
Kurtosis

1 aB(a+B-l)(2ff+P- l ) ( B + l )
~ 3 1 2

( B - 3)(8 - 2)(B -

Complicated

>

Appendix Ill A compendium of distributions

605

Beta-Negative Binomial
VoseBetaNegBin(s, a , j3)

Graphs

Uses
The Beta-Negative Binomial(s, a , B ) distribution models the number of failures that will occur in a binomial
process before s successes are observed and where the binomial probability p is itself a random variable taking a
Beta(a, B ) distribution.
Equations

Probability mass function

f

( x )=

"

Cumulative distribution function

F(x) =
i=O

Parameter restriction
Domain
Mean
Variance
Skewness
Kurtosis

+ X ) r(B+
( +~B + s
r ( s + i ) r ( a + B ) r ( a + i ) r(B +
+

+

r(s x)r(a B) r(a
r ( s )r ( x 1) r(a)r(B)r

+

S)

+XI
S)

r ( ~r ()i + 1 ) r ( a ) r(p)r ( a + + s + i )

I s>O,LY>O,BPO

-

-

-

x = (0, 1 , 2 , . . .)
S a
,?-1
s a ( s a + s ~ - s + B ~ - 2 B + f f B - a + 1)
E V
( B - 2)(B 1 ~a(a+B-1)(2a+p-l)(s+B-l)(2s+B-l)
v3/2

Complicated

( p - 3) ( B - 2 ) ( B - 113

for

p

>1

for

B

>2

f o r p >,

606

R~skAnalysts

Binomial
VoseBinornial(n, p)
Graphs

Uses

The binomial distribution models the number of successes from n independent trials where there is a probability
p of success in each trial.
The binomial distribution has an enormous number of uses. Beyond simple binomial processes, many other
stochastic processes can be usefully reduced to a binomial process to resolve problems. For example:
Binomial process:
- number of false starts of a car in n attempts;
- number of faulty items in n from a production line;
number of n randomly selected people with some characteristic.
Reduced to binomial:
- number of machines that last longer than T hours of operation without failure;
- blood samples that have 0 or >O antibodies;
- approximation to a hypergeometric distribution.
Comments

The binomial distribution makes the assumption that the probability p remains the same value no matter how many
trials are performed. That would imply that my aim doesn't get better or worse. It wouldn't be a good estimator,
for instance, if the chance of success improved with the number of trials.
Another example: the number of faulty computer chips in a 2000 volume batch where there is a 2 % probability
that any one chip is faulty = Binomial(2000, 2 %).

Appendix Ill A compendium of distributions

607

Equations

Probability mass function

f(x) =

Cumulative distribution function

F(x) =

n
Lx J

i=O
Parameter restriction
Mean
Mode
Variance
Skewness
Kurtosis

(7)

pi(l - p)n-i

0 5 p 5 1; n = (0, 1, 2 , . . .}
x = 10. 1 . 2 . . . . . n l

Domain

I

p X ( l- p y x

I

I

np

+
+

p ( n 1 ) - 1 and p(n
~ ( n 1)
nu(1 - D )
1-2p
J n p ( 1 - P)
1
+3
n p ( l - P)

1--

+ 1)

+

if p(n 1) is an integer
otherwise

2
n

Bradford
VoseBradford(0, min, max)
Graphs

Comments

The Bradford distribution (also known as the "Bradford law of scattering") is similar to a Pareto distribution that
has been truncated on the right. It is right skewed, peaking at its minimum. The greater the value of 0, the faster its
density decreases as one moves away from the minimum. Its genesis is essentially empirical, and very similar to

608

Risk Analysis

the idea behind the Pareto too. Samuel Clement Bradford originally developed data by studying the distribution of
articles in journals in two scientific areas, applied geophysics and lubrication. He studied the rates at which articles
relevant to each subject area appeared in journals in those areas. He identified all journals that published more than
a certain number of articles in the test areas per year, as well as in other ranges of descending frequency. He wrote
(Bradford, 1948, p. 116):

I f scient$c journals are arranged in order of decreasing productivity of articles on
a given subject, they may be divided into a nucleus of periodicals more particularly
devoted to the subject and several groups or zones containing the same number of
articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding
zones will be as 1 :n : n 2 . . .
Bradford only identified three zones. He found that the value of n was roughly 5. So, for example, if a study on
a topic discovers that six journals contain one-third of the relevant articles found, then 6 x 5 = 30 journals will,
among them, contain another third of all the relevant articles found, and the last third will be the most scattered of
all, being spread out over 6 x 52 = 300 journals.
Bradford's observations are pretty robust. The theory has a lot of implications in researching and investment
in periodicals; for example, how many journals an institute should subscribe to, or one should review in a study.
It also gives a guide for advertising, by identifying the first third of journals that have the highest impact, helps
determine whether journals on a new(ish) topic (or arena like e-journals) have reached a stabilised population and
tests the efficiency of web browsers.

Equations

1

o

Probability density
function

f (x) = (O(x - min) + max - min) log(0 + 1)

Parameter restriction

0 < 0,min < max

Domain

min 5 x 5 max
O(max - min) k[min(O
Ok

Mean

+

Mode

min

Variance

(max - min)'[O(k - 2)
20k2

Skewness
Kurtosis

+ 1) - max]

+ 2k]

+

where k = log(@ I)

Appendix Ill A compendium of distributions

609

Burr
VoseBun(a, b, c, d )

Graphs
The Burr distribution (type 111 of the list originally presented by Burr) is a right-skewed distribution bounded at
a; b is a scale parameter, while c and d control its shape. Burr(0, 1, c , d ) is a unit Burr distribution. Examples of
the Burr distribution are given below.

Uses
The Burr distribution has a flexible shape and controllable scale and location that make it appealing to fit to data. It
has, for example, been found to fit tree trunk diameter data for the lumber industry. It is frequently used to model
insurance claim sizes, and is sometimes considered as an alternative to a normal distribution when data show slight
positive skewness.
Equations
cd

Probability density function

f (x) =

Cumulative
function

F (x) =

+

z-c)d-l where z =

(7)

I

distribution

Parameter restriction
Domain

bZc+, (l

(1

+ z?Id

b>O,c>O,d>O

Ix>a

Mean

a

+

br

1--

1
C

r

d+-

1
C

r (d)

Mode

I

1

a

otherwise

I

6I0

Risk Analys~s

I

Variance

1

k"r 2 ( d )

2
wherek=T(d)T(l--)T(d+-)

2
c

-I-'(I-

-c1 ) r i b + -1)

I

Skewness
r2(d) - 2 r 3 (I
k3/2

-

:) r3(:

r2(d)

+d)

-

31- (1 -

E) r (1

-

;) I- (;

r(

+ d ) r (; + d)

4

+r(l-;)r(:+d)-

-

Kurtosis

Cauchy
VoseCauchy(a, b)
Graphs
The standard Cauchy distribution is derived from the ratio of two independent normal distributions, i.e. if X and
Y are two independent Normal(0, 1) distributions, then

The Cauchy(a, b) is shifted to have a median at a , and to have b times the spread of a Cauchy(0, 1). Examples
of the Cauchy distribution are given below.

Append~xIll A compendium of d~stributions 61 1

Uses
The Cauchy distribution is not often used in risk analysis. It is used in mechanical and electrical theory, physical anthropology and measurement and calibration problems. For example, in physics it is usually called a
Lorentzian distribution, where it is the distribution of the energy of an unstable state in quantum mechanics. It is also used to model the points of impact of a fixed straight line of particles emitted from a point
source.
The most common use of a Cauchy distribution is to show how "smart" you are by quoting it whenever
someone generalises about how distributions are used, because it is the exception in many ways: in principle, it has
no defined mean (although by symmetry this is usually accepted as being its median = a), and no other defined
moments.

Comments
The distribution is symmetric about a and the spread of the distribution increases with increasing b. The Cauchy
distribution is peculiar and most noted because none of its moments is well defined (i.e. mean, standard deviation,
etc.), their determination being the difference between two integrals that both sum to infinity. Although it looks
similar to the normal distribution, it has much heavier tails. From X / Y = Cauchy(0, 1) above you'll appreciate
that the reciprocal of a Cauchy distribution is another Cauchy distribution (it is just swapping the two normal
distributions around). The range a - b to a + b contains 50 % of the probability area.

Equations

Cumulative distribution function

~ ( x =) 1

Parameter restriction

b>O

I Domain

I

+

-ooO

Jir

6 13

(F)
=p

Mean

r(f)
Mode

l/v-l for v 2

Variance

v-u2=V

Skewness

Complicated

Kurtosis

Complicated

1

Chi-squared
VoseChiSq(v)
Graphs
The Chi-Squared distribution is a right-skewed distribution bounded at zero. v is called the "degrees of freedom"
from its use in statistics below.

Uses
The sum of the squares of v unit normal distributions (i.e. Normal(0, 1)"2) is a ChiSq(v) distribution: so ChiSq(2) =
Normal(0, 1)A2 Normal(0, 1)"2, for example. It is this property that makes it very useful in statistics, particularly
classical statistics.
In statistics we collect a set of observations and from calculating some sample statistics (the mean, variance, etc.)
attempt to infer something about the stochastic process from which the data came. If the samples are from a normally
distributed population, then the sample variance is a random variable that is a shifted, rescaled ChiSq distribution.
The chi-square distribution is also used to determine the goodness of fit (GOF) of a distribution to a histogram
of the available data (a ChiSq test). The method attempts to make a ChiSq-distributed statistic by taking the sum
of squared errors, normalising them to be N(0, 1).

+

6 14

Risk Analvs~s

In our view, the ChiSq tests and statistics get overused (especially the GOF statistic) because the normality
assumption is often tenuous.
Comments

As v gets large, so it is the sum of a large number of [N(O, 10A2] distributions and, through the central limit
theorem, approximates a normal distribution itself.
Sometimes written as X 2 ( v ) . Also related to the gamma distribution: Chisq(v) = Gamma(vl2, 2).
Equations
x i ' enp (-5)

f (x) =

Probability density function

2f l-

(Y)

Cumulative distribution function

No closed form

Parameter restriction

v > 0, v is an integer
x>O

Domain
I

I

Mean

v

Mode

0
ifv<2
v - 1 otherwise

variance
Skewness
Kurtosis

I

I

2v

8

-

V

3 + , 12

Cumulative Ascending
VoseCumulA(min, max, {xi},{Pi1)
Graphs

Appendix Ill A compendium of distributions

6 15

Uses
1. Empirical distribution of data. The cumulative distribution is very useful for converting a set of data values
into a first- or second-order empirical distribution.

2. Building a statistical conjdence distribution. The cumulative distribution can be used to construct uncertainty
distributions when using some classical statistical methods. Examples: p in a binomial process with s observed
suucesses in n trials (F(x) = 1 - VoseBinomialProb(s, n, p , 1) + 0.5 * VoseBinomialProb(s, n, p , 0)); and A
in a Poisson process with a events observed in some unit of exposure (F(x) = 1 - VosePoissonProb(a, A , 1) +
0.5 * VosePoissonProb(a, A, 0)).
3. Modelling expert opinion. The cumulative distribution is used in some texts to model expert opinion. The expert
is asked for a minimum, maximum and a few percentiles (e.g. 25 %, 50%, 75 %). However, we have found
it largely unsatisfactory because of the insensitivity of its probability scale. A small change in the shape of
the cumulative distribution that would pass unnoticed produces a radical change in the corresponding relative
frequency plot that would not be acceptable.
The cumulative distribution is however very useful to model an expert's opinion of a variable whose range
covers several orders of magnitude in some sort of exponential way. For example, the number of bacteria in a kg
of meat will increase exponentially with time. The meat may contain 100 units of bacteria or 1 million. In such
circumstances, it is fruitless to attempt to use a relative distribution directly.
Equations

Probability density function

f ( x ) = 4 + 1 - Pi for xi 5 x < xi+l
Xi+l - Xi
i c (0, I , ..., n}
where x~ = min, x,+l = max, Po = 0, P,+I = 1

Cumulative distribution function

F(x) = a
xi+l -xi (pi+l - pi)

Parameter restriction

O 5 P i 5 1 , Pi5Pi+l,
rnin 5 x 5 max

Domain

f(xi)

Mean

+ pi

xi 5 x < Xi+l
for .
z E (0, I , ..., n}

xiO

- $1

i =O

Mode

No unique mode

Variance

Complicated

Skewness

Complicated

Kurtosis

Complicated

Cumulative Descending
VoseCumulD(min, max, {x, ) , {Pi))

Graphs
This is another form of the cumulative distribution, but here the list of cumulative probabilities are the probabilities
of being greater than or equal to their corresponding x values.
Examples of the cumulative descending distribution are given below.

6 16

Risk Analysis

Uses and equations

See the cumulative ascending distribution (only here Pi+, 5 Pi,so the Pi values can be converted to those of the
CumulA distribution by subtracting them from 1 : pi'= 1 - Pi.

Dagum
VoseDagum(a , b, p)
Graphs

Appendix Ill A compendium of distribut~ons 6 17

The Dagum distribution is often encountered in the actuarial literature or the income distribution literature.
The distribution was originally derived by Dagum while studying the elasticity of the cdf of income.
a and p are shape parameters and b is a scale parameter.
Uses
The Dagum distribution is sometimes used for fitting to aggregate fire losses.
Equations

Probability density function

ap x a P

f ( x )=

bUP[ I

+ (;)a]p+'
-a

Cumulative distribution function

F(x) = [1

(

Parameter restriction

(

/

Domain

1 x>o
br p

Variance

+ (f)

1

-P

I

a>O, b>O,p>O

Mean
Mode

'

I

1
+
a

r

1 - I
a

PI
b ( G ) '

ifap > I

0

else

$ [ r w - :)
(p+

r (1

-

:)

-

r2( p +

:)

r2(I

-

:)]

I

Skewness

I

I

Kurtosis:

I

Notes
The Dagum distribution is also called the inverse Burr distribution or the kappa distribution.
When a = p, the distribution is also called the inverse paralogistic distribution.

6 18

Risk Analysis

Delaporte
VoseDelaporte(a, B, A)
Graphs

Uses

A very common starting point for modelling the numbers of events that occur randomly distributed in time and/or
space (e.g. the number of claims that will be received by an insurance company) is the Poisson distribution:

Events = Poisson(h)
where h is the expected number of events during the period of interest. The Poisson distribution has a mean and
variance equal to A, and one often sees historic data (e.g. frequency of insurance claims) with a variance greater
than the mean so that the Poisson model underestimates the level of randomness. A standard method to incorporate
greater variance is to assume that h is itself a random variable (and the resultant frequency distribution is called a
mixed Poisson model). A Gamma(cr, B) distribution is most commonly used to describe the random variation of h
between periods, so
Events = Poisson(Gamma(a, p ) )

(1)

This is the P6lya(a, 8 ) distribution.
Alternatively, one might consider that some part of the Poisson intensity is constant and has an additional
component that is random, following a gamma distribution:
Events = Poisson(h
This is the Delaporte distribution, i.e.

+ Gamma(cr, B))

(2)

Appendix Ill A compendium of distributions

6 19

We can split this equation up:

Special cases of the Delaporte distribution:

Equations
X

Probability density function

f (x) =
i=O
x

r(a+ i ) ~ ~ h j - ' e - '

.I

C
C r ( a ) i ! ( l+ p)a+'(j - i ) !
j=o i=o

Cumulative distribution
function

F(x) =

Parameter restriction :

a>O,B>O,h>O

Domain

x = ( 0 , 1,2, . . .}

Mean

h+ab

Mode

z, z
Lzl

+1

if z is an integer where

= ( a - 1)B + A

else

+ 1)

Variance

h+aB(B

Skewness

~+aj3(1+3/3+219~)
( h aj3(1

Kurtosis

z

+

+

+ 3h2 + @ ( I +

6h

+ 6hB + 7j3 + 128' + 6p3 + 3aD + 6ap2 + 3aB3)
( h + aB(1 + 8))'

Discrete

VoseDiscrete((xi 1, {piJ)

Graphs
The discrete distribution is a general type of function used to describe a variable that can take one of several
explicit discrete values { x i }and where a probability weight { p i )is assigned to each value; for example, the number
of bridges to be built over a motorway extension or the number of times a software module will have to be recoded
after testing. An example of the discrete distribution is shown below.

620

Risk Analysis

VoseDiscrete({4,7,8,17,20),{20,4,11,44,24))

I , , , , , , , , , , , , , ,
0 1

2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 17 18 19 20 21 22

Uses

1. Probability branching. A discrete distribution is also particularly useful to describe probabilistic branching. For
example, a firm estimates that it will sell Normal(l20, 10) tonnes of weed killer next year unless a rival firm
comes out with a competing product, in which case it estimates its sales will drop to Normal(85, 9) tonnes. It
also estimates that there is a 30 % chance of the competing product appearing. This could be modelled by
Sales = VoseDiscrete(Al:A2, B l :B2)
where the cells Al:B2 contain the formulae

A l : = VoseNorrnal(l20, 10)

2.

Combining expert opinion. A discrete distribution can also be used to combine two or more conflicting expert
opinions.

Equations

Probability mass function

I

Cumulative distribution function

f (xi) = pi

1

i

F ( X ) = ~ ~ i ~f x ; j x < x j + l
assuming the xis are in ascending order

/ Parameter restriction

1P;>o,~>o~~P~>o
I

n

I

Append~xIll A compendium of distributions

Domain

Ix

62 1

---

= { x l , x 2 , .. . , x n }

I

Mean

No unique mean

Mode

The xi with the greatest pi

Variance

Skewness

Kurtosis

-

2

v2 i = l

(xi - ~

)

~

~

i

Discrete Uniform
VoseDUniform({x, 1)
Graph
The discrete uniform distribution describes a variable that can take one of several explicit discrete values with
equal probabilities of taking any particular value.

Uses
It is not often that we come across a variable that can take one of several values each with equal probability.
However, there are a couple of modelling techniques that require that capability:
Bootstrap. Resampling in univariate non-parametric bootstrap.
Fitting empirical distribution to data. Creating an empirical distribution directly from a dataset, i.e. where we
believe that the list of data values is a good representation of the randomness of the variable.

622

Risk Analysis

Equations

f ( x i ) = T1i , i = 1, ..., n

Probability mass function

if xi

Cumulative distribution function

(x

< xi+l assuming

the xis are in ascending order

I

Parameter restriction
Domain

I

I
x={xI~xz....,X~J

1 "

Mean

xxi
n

Mode

Not uniquely defined

Variance

-1

i=l

(xi - /.L12

=v

i=1
n

Skewness

1

(xi - /.L13

$
p

i=l
n

Kurtosis

c

I
n v2

(xi - p14

i=l

Error Function
VoseErf(h)
Graphs

Uses
The error function distribution is derived from a normal distribution by setting p = 0 and a = -. 1 Therefore,
hz/Z
the uses are the same as those of the normal distribution.

Appendix Ill A compendium of distr~butions 623

Equations

f (*)

Probability density function

5

= h exp(-(h*)')

I Cumulative distribution function

I F ( x ) = @(z/Zhx)

I Parameter restriction
I Domain

I h>O
I -mc,x<+m

I
I

I

Mean

10

1

I

Mode

10

1

Variance

/

Skewness

@: error function

I

7h2

1

0

1

1 Kurtosis

Graphs
The Erlang distribution (or m-Erlang distribution) is a probability distribution developed by A. K. Erlang. It is a
special case of the gamma distribution. A Gamma(m, j3) distribution is equal to an Erlang(m, j3) distribution when
m is an integer.

624

Risk Analysis

Uses

The Erlang distribution is used to predict waiting times in queuing systems, etc., where a Poisson process is in
operation, in the same way as a gamma distribution.
Comments
A. K. Erlang worked a lot in traffic modelling. There are thus two other Erlang distributions, both used in modelling

traffic:
Erlang B distribution: this is the easier of the two, and can be used, for example, in a call centre to calculate the number of trunks one needs to carry a certain amount of phone traffic with a certain "target
service".
Erlang C distribution: this formula is much more difficult and is often used, for example, to calculate how long
callers will have to wait before being connected to a human in a call centre or similar situation.
Equations
p-"xrn-I exp

Probability density function

f

(- ;)

(x)=

Cumulative distribution function

( m - I)!
No closed form

Parameter restriction

m>O,B>O

Domain

x>O

Mean

mB

Mode

B(m

Variance

mB2

Skewness

2

Kurtosis

3+, 6

-

1)

m

Error
VoseError(p., a, v)
Graphs
The error distribution goes by the names "exponential power distribution" and "generalised error distribution".
This three-parameter distribution offers a variety of symmetric shapes, as shown in the figures below. The first
pane shows the effect on the distribution's shape of varying parameter v. Note that v = 2 is a normal distribution,
v = 1 is a Laplace distribution and the distribution approaches a uniform as v approaches infinity. The second
pane shows the change in the distribution's spread by varying parameter a, its standard deviation. Parameter p is
simply the location of the distribution's peak, and the distribution's mean.

Appendix Ill A compendium of distributions

625

Uses
The error distribution finds quite a lot of use as a prior distribution in Bayesian inference because it has greater
flexibility than a normal prior, in that the error distribution is flatter than a normal (platykurtic) when v > 2, and
more peaked than a normal distribution (leptokurtic) when v < 2. Thus, using the GED allows one to maintain the
same mean and variance, but vary the distribution's shape (via the parameter v) as required.
We have also seen the error distribution being used to model variations in historic UK property market returns.

Equations

Probability density function

Cumulative distribution function

No closed form

Parameter restriction

-cc .rp<+cc, f~ > O , v > O

Domain

-00

i
x

< +cc

I Mean
Mode

Y

Variance

n2

Skewness

0

Kurtosis

l-

(t) (t)
4:)'

Exponential
VoseExpon(j3)
Graphs

The Expon(j3) is a right-skewed distribution bounded at zero with a mean of j3. It only has one shape. Examples
of the exponential distribution are given below.

Append~xIll A compendium of distr~but~ons 627

Uses

The Expon(B) models the time until the occurrence of a first event in a Poisson process. For example:
the time until the next earthquake;
the decay of a particle in a mass of radioactive material;
the length of telephone conversations.
The parameter

B is the mean time until the occurrence of the next event.

Example

An electronic circuit could be considered to have a constant instantaneous failure rate, meaning that at a small
interval of time it has the same probability of failing, given it has survived so far. Destructive tests show that the
circuit lasts on average 5200 hours of operation. The time until failure of any single circuit can be modelled as
Expon(5200) hours. Interestingly, if it conforms to a true Poisson process, this estimate will be independent of how
many hours of operation, if any, the circuit has already survived.
Equations

f (x) =

Probability density function

8

Cumulative distribution function
Parameter restriction
Domain

-

8 > 0
x > 0
-

Mean

B

Mode

0

Variance

8

Skewness

2

Kurtosis

9

-

Extreme Value Maximum
VoseExtValueMax(a, 6 )
Graph

The extreme value maximum models the maximum of a set of random variables that have an underlying distribution
belonging to the exponential family, e.g. exponential, gamma, Weibull, normal, lognormal, logistic and itself.

628

Risk Analysis

Uses
Engineers are often interested in extreme values of a parameter (like minimum strength, maximum impinging force)
because they are the values that determine whether a system will potentially fail. For example: wind strengths
impinging on a building - it must be designed to sustain the largest wind with minimum damage within the bounds
of the finances available to build it; maximum wave height for designing offshore platforms, breakwaters and dikes;
pollution emissions for a factory to ensure that, at its maximum, it will fall below the legal limit; determining the
strength of a chain, since it is equal to the strength of its weakest link; modelling the extremes of meteorological
events since these cause the greatest impact.
Equations

x-a
exp
b
x-a
F ( x ) = exp - exp - b
b>0
1
f ( x ) = - exp
b

Probability density function
Cumulative distribution function
Parameter restriction
Domain

-co < x < +co

Mean

a - b r 1 ( l )where

I Mode

1

a
b2rr2

Variance

6

Skewness

1.139547

Kurtosis

5.4

--

-

x-a
exp -b

r'(1)E -0.577216

I

Appendix Ill A cornpend~urnof distributions

629

Extreme Value Minimum
VoseExtVaIueMin(a, b)

Graph

The extreme value minimum models the minimum of a set of random variables that have an underlying distribution
belonging to the exponential family, e.g. exponential, gamma, Weibull, normal, lognormal, logistic and itself.

Uses
Engineers are often interested in extreme values of a parameter (like minimum strength, maximum impinging force)
because they are the values that determine whether a system will potentially fail. For example: wind strengths
impinging on a building - it must be designed to sustain the largest wind with minimum damage within the bounds
of the finances available to build it; maximum wave height for designing offshore platforms, breakwaters and dikes;
pollution emissions for a factory to ensure that, at its maximum, it will fall below the legal limit; determining the
strength of a chain, since it is equal to the strength of its weakest link; modelling the extremes of meteorological
events since these cause the greatest impact.
Equations

1
x+a
exp b
b

Probability density function

f(x) =

Cumulative distribution function

x+a
F (x) = exp - exp b

Parameter restriction

b>O

Domain

-00

Mean

a

Mode

a

-

--

< x < +co

b r l ( l ) where r'(1) E -0.577216

630

1

R~skAnalysis

Variance

I

- 1.139547

Skewness

I

Kurtosis

ti

1

5.4

I

Graphs
The F distribution (sometimes known as the Fisher-Snedecor distribution, and taking Fisher's initial) is commonly
used in a variety of statistical tests. It is derived from the ratio of two normalised chi-square distributions with vl
and v2 degrees of freedom as follows:

Uses
The most common use of the F distribution you'll see in statistics textbooks is to compare the variance between
two (assumed normally distributed) populations. From a risk analysis perspective, it is very infrequent that we
would wish to model the ratio of two estimated variances (which is essentially the F-test in this circumstance), so
the F distribution is not particularly useful to us.
Equations

A ~ ~ e n dIlli xA corn~endlumof distributions

I

Cumulative distribution
function

No closed form

Parameter restriction

vl > 0 , v2 > 0

I

Domain

I

x>O
"2

Mean
"2

for v2 > 2

-1

" ~ ( vI 2)
" 1 (Y
2)
0

Mode

if vl > 2

+

if vl = 2

+

2v;(vl
V2 - 2)
v , (v2 - 2)2(v2 - 4 )

Variance

(2v1

Skewness

631

+ v2 - 2)J-

+ v2

(4- 6 ) 2 / v l ( v 1

if v2 > 4

-

if v2 > 6

2)

Kurtosis

Fatigue Life
VoseFatigue(a, @, y)

Graphs
The fatigue life distribution is a right-skewed distribution bounded at a.?! , is a scale parameter, while y controls
its shape. Examples of the fatigue life distribution are given below.
2.5

-

2-

1.5 -

----____
I

I

Uses
The fatigue life distribution was originally derived in Bimbaum and Saunders (1969) as the failure of a structure
owing to the growth of cracks. The conceptual model had a single dominant crack appear and grow as the structure

632

Risk Analvsis

experiences repeated shock patterns up to the point that the crack is sufficiently long to cause failure. Assuming
that the incremental growth of a crack with each shock follows the same distribution, that each incremental growth
is an independent sample from that distribution and that there are a large number of these small increases in length
before failure, the total crack length will follow a normal distribution from the central limit theorem. Birnbaum
and Saunders determined the distribution of the number of these cycles necessary to cause failure. If the shocks
occur more or less regularly in time, we can replace the probability that the structure will fail by a certain number
of shocks with the probability that it fails within a certain amount of time.
Thus, the fatigue life distribution is used a great deal to model the lifetime of a device suffering from
fatigue. Other distributions in common use to model the lifetime of a devise are the lognormal, exponential
and Weibull.
Big assumptions, so be careful in using this distribution regardless of its popularity. If the growth is likely to
be proportional to the crack size, the lognormal distribution is more appropriate.

Equations
1
Probability density function

Cumulative distribution function

where

1

F(x) = 8

($)

@>O,Y > O

Domain

x > a

Mean

.+,!?(I+$)

Variance

fi

and q = unit normal density

Parameter restriction

Mode

;=

where 8 = unit normal cdf

I py2 +
Ji

Kurtosis

Gamma
VoseGamma(a, #?)
Graphs
The gamma distribution is right skewed and bounded at zero. It is a parametric distribution based on Poisson
mathematics. Examples of the gamma distribution are given below.

Appendix Ill A compendium of distributions

633

Uses
The gamma distribution is extremely important in risk analysis modelling, with a number of different uses:
1. Poisson waiting time. The Garnma(a, j3) distribution models the time required for a events to occur, given that
the events occur randomly in a Poisson process with a mean time between events of j3. For example, if we
know that major flooding occurs in a town on average every 6 years, Gamma(4, 6) models how many years it
will take before the next four floods have occurred.

2. Random variation of a Poisson intensity A The gamma distribution is used for its convenience as a description
of random variability of h in a Poisson process. It is convenient because of the identity

The gamma distribution can take a variety of shapes, from an exponential to a normal, so random variations
in h for a Poisson can often be well approximated by some gamma, in which case the NegBin distribution
becomes a neat combination of the two.

3. Conjugate prior distribution in Bayesian inference. In Bayesian inference, the Gamma(a, j3) distribution is the
conjugate to the Poisson likelihood function, which makes it a useful distribution to describe the uncertainty
4.

about the Poisson mean A.
Prior distribution for normal Bayesian inference. If X is Gamma(a, j3) distributed, then Y = XA(- 112) is an
inverted gamma distribution (InvGamma(a, j3)) which is sometimes used as a Bayesian prior for a for a normal
distribution.

Equations

j3-"xa-1 exp
Probability density function

Parameter restriction

f (x) =

(-i )

634

Risk Analys~s

Domain

x>_O

Mean

ffB

Mode

p ( a - 1) i f a 2 1
0
ifa!<1

Variance
Skewness

aB2
2
-

Kurtosis

6
3+-

a
a!

Geometric
VoseGeometric(p)
Graphs

Geometric(p) models the number of failures that will occur before the first success in a set of binomial trials,
given that p is the probability of a trial succeeding.
0.6 -

0.54!

0.4-

0.3-

+
0.2

-

1

0

5

10

15

20

25

30

35

Uses
1. Dry oil wells. The geometric distribution is sometimes quoted as useful to estimate the number of dry wells
an oil company will drill in a particular section before getting a producing well. That would, however, be
assuming that (a) the company doesn't learn from its mistakes and (b) it has the money and obstinacy to keep
drilling new wells in spite of the cost.

Appendlx Ill A compendium of distributions

Equations

Parameter restriction
Domain
Mean
if x, is an integer

x,, x,-1

Mode

Lxm 1
where x, =

1

otherwise
- l)(M - D - I)

(S

In - 1 )

Variance
V ( D - 2M - 1 ) ( 2 ~
- D - 1)
( D 1 ) ( D 3 ) v312

Skewness

+

+

Kurtosis
c(D + 1)(D- 6s)

+ 3(M

-

+

+ +

D ) ( M l ) ( s 2 ) 6s2 - 3 ( M - D ) ( M
+18(M - D ) ( M l ) s 2 / ( ~ 112

(D

+

+ 3 ) ( D + 4) V

+

+ l ) s ( 6 + s ) / ( D + 1)

643

644

Risk Analysis

Johnson Bounded
VoseJohnsonB(al, az,min, max)

Graphs

Uses
The Johnson bounded distribution has a range defined by the min and max parameters. Combined with its flexibility
in shape, this makes it a viable alternative to the PERT, triangular and uniform distributions for modelling expert
opinion. A public domain software product called VISIFIT allows the user to define the bounds and pretty much
any two statistics for the distribution (mode, mean, standard deviation) and will return the corresponding distribution
parameters.
Setting min to 0 and max to 1 gives a random variable that is sometimes used to model ratios, probabilities,
etc., instead of a beta distribution.
The distribution name comes from Johnson (1949) who proposed a system for categorising distributions, in much
the same spirit that Pearson did. Johnson's idea was to translate distributions to be a function of a unit normal
distribution, one of the few distributions for which there were good tools available at the time to handle.
Equations

Probability density function

a2(max - min)

f (x> =
(X

Cumulative distribution function

ex,

- min) (max - x ) 6

F(x) = @

-;

(a, + a , l n [ ~ - r m n ] ) ~
max - x

x-min

where F ( x ) = @[.I is the distribution function for a Normal(0, 1).
Parameter restriction

a2 >

Domain

min < x < max

Mean

Complicated

Mode

Complicated

0 , m a x > min

-

Appendix Ill A compendium of distributions

Variance

Complicated

Skewness

Complicated

Kurtosis

(

641

Complicated

Johnson Unbounded
VoseJohnsonU(al , az, jl, y )

Graphs

Uses
The main use of the Johnson unbounded distribution is that it can be made to have any combination of skewness
and kurtosis. Thus, it provides a flexible distribution to fit to data by matching these moments. That said, it is an
infrequently used distribution in risk analysis.
The distribution name comes from Johnson (1949) who proposed a system for categorising distributions, in much
the same spirit that Pearson did. Johnson's idea was to translate distributions to be a function of a unit normal
distribution, one of the few distributions for which there were good tools available at the time to handle.
Equations

Cumulative distribution function

I

F (x) =

I

a1

+ a2 ln

rxdy+/ml1
-

Graphs

Uses
The Kumaraswamy distribution is not widely used, but, for example, it has been applied to model the storage
volume of a reservoir (Fletcher and Ponnambalam, 1996) and system design. It has a simple form for its density
and cumulative distributions and is very flexible, like the beta distribution (which does not have simple forms for
these functions). It will probably have a lot more applications as it becomes better known.
Equations
Probability density function

f (x) = aBxa-'(1

Cumulative distribution function

F ( x ) = 1 - (1 - xu)@

Parameter restriction

- xu)@-'

Ia>0,~>0

Domain

0 s x i l

Mean

B

r(i+p+

;)
1

--

e p

(s),
1

Mode

ifa<1
ifB(1

i f a > 1 and/3> 1

Graphs

Uses

A Kumaraswamy distribution stretched and shifted to have a specified minimum and maximum, in the same fashion
that a beta4 is a stretch and shifted beta distribution.
Equations
Probability density
function

f (x)=

a,!?z f f - ' ( 1 - zff)P-'

(max - min)

Cumulative distribution
function

B > 0 , min ( m a x

Parameter restriction

a >0,

Domain

min 5 x 5 max

Mean

Mode

where

z

=

x - min
max - min

-

--

Appendix Ill A compendium of d~str~butlons 649

Kurtosis

Laplace
VoseLaplace(@, a )

Graphs
If X and Y are two identical independent Exponential(l1a) distributions, and if X is shifted p to the right of Y,
then (X - Y) is a Laplace(@, a ) distribution. The Laplace distribution has a strange shape with a sharp peak and
tails that are longer than tails of a normal distribution. The figure below plots a Laplace(0, 1) against a Normal(0, 1)
distribution.

-4

-3

-2

-1

0

1

2

3

4

Uses
The Laplace has found a variety of very specific uses, but they nearly all relate to the fact that it has long tails.
Comments
When = 0 and a = 1 we have the standard form of the Laplace distribution, which is also occasionally called
"Poisson's first law of error". The Laplace distribution is also known as the double-exponential distribution (although

650

R~skAnalysts

the Gumbel extreme-value distribution also takes this name), the "two-tailed exponential" and the "bilateral exponential
distribution".
Equations

-

-

1

Probability density function

-A

f ( x ) = -exp
A a

Cumulative distribution function

F(x) = -exp

2

[

l x - PI
a

I"-:

ifx < p

-

1
F(x)=l--exp
2

-

-

Parameter restriction

-ooO

Domain

-00

Mean

P

Mode

P

Variance

o

Skewness

0

Kurtosis

6

- PI

ifx > p

a

< x < +oo

Levy
VoseLevy(c, a )

Graphs
The LCvy distribution, named after Paul Pierre LCvy, is one of the few distributions that are stable' and that have
probability density functions that are analytically expressible. The others are the normal distribution and the Cauchy
distribution.

0

'

2

4

6

8

10

A distribution is said to be stable if summing independent random variables from that distribution results in a random variable
from that same distribution type.

Append~xIll A compendium of distributions

651

Uses

The LCvy distribution is sometimes used in financial engineering to model price changes. This distribution takes into
account the leptokurtosis ("fat" tails) one sometimes empirically observes in price changes on financial markets.
Equations
Probability density function

I

Cumulative distribution function

I

Parameter restriction

1
I

I

Mean

I

c >0

Domain

I

No closed form
x>a

I Infinite

1

I

I

Mode
Variance
Skewness
Kurtosis

I

Infinite
Undefined

I Undefined

I

J

Logarithmic
VoseLogarithmic(Q)
Graphs
The logarithmic distribution (sometimes known as the logarithmic series distribution) is a discrete, positive distribution, peaking at x = 1, with one parameter and a long right tail. The figures below show two examples of the
logarithmic distribution.

652

Risk Analysis

Uses
The logarithmic distribution is not very commonly used in risk analysis. However, it has been used to describe, for
example: the number of items purchased by a consumer in a particular period; the number of bird and plant species
in an area; and the number of parasites per host. There is some theory that relates the latter two to an observation by
Newcomb (1881) that the frequency of use of different digits in natural numbers followed a logarithmic distribution.
Equations
Probability mass function

f

=

-ex
ln(l

Cumulative distribution function

i in(1- 0)

Parameter restriction

O < @ t l

Domain

x = { 1 , 2 , 3 ,.. .}

Mean

I

F(x) =

8)
-0'

-

Lxl

Mode

Variance
Skewness
Kurtosis

0
(0 - 1) ln(1 - 0)
11

I p((1 - 0 ) ~ ' p ) = V where p is the mean
-

30
2e2
+ + -----+ ln2(1-0)
ln(1-0)
40(1 + 8)

-8
1 6
(1 - 0)3V3I2ln(1 - 8)
-8
(I - ~ ) ~ ~ ~ l0)n ( l 60

+ln2(1 - 8) + ln3(1303+ 8)

Graphs

I

Appendix Ill A compendium of distributions

653

Uses
A variable X is loggamma distributed if its natural log is gamma distributed. In ModelRisk we include an extra shift
parameter y because a standard loggarnma distribution has a minimum value of 1 when the gamma variable = 0.
Thus

LogGamma(cr, j?, y ) = EXP[Gamma(a, j?)1

+ ( y - 1)

The loggamma distribution is sometimes used to model the distribution of claim size in insurance. Set y = 1 to
get the standard LogGarnma(cr, j?) distribution.
Equations

Cumulative distribution
function

I

Parameter restriction

No closed form

I

a > 0,

Domain

X > y

Mean

(1 - p)-"

1

9, > 0

+y

-

if j 3 < 1

1

j?(a - 1)

Mode

ifa>l

0

else

Variance

(1 - 2 j 3 ) y

Skewness

(1 - 3j3)-@ - 3(1 - 3 8 + 2
~312

-

(1 - j?)-201

V

if j3 < 112
~ + 2(1~ - 81-3"
)

~

~
if /3 < 113

Kurtosis

Logistic
VoseLogistic(a, 8 )
Graphs
The logistic distribution looks similar to the normal distribution but has a kurtosis of 4.2 compared with the normal
kurtosis of 3. Examples of the logistic distribution are given below.

654

Risk Analysis

Uses

The logistic distribution is popular in demographic and economic modelling because it is similar to the normal
distribution but somewhat more peaked. It does not appear often in risk analysis modelling.
Comments

The cumulative function has also been used as a model for a "growth curve". Sometimes called the sech-squared
distribution because its distribution function can be written in a form that includes a sech. Its mathematical derivation
is the limiting distribution as n approaches infinity of the standardised mid-range (average of the highest and lowest
values) of a random sample of size n from an exponential-type distribution.
Equations

f (x)=

Probability density function

z
B(1

+ zI2

1

Cumulative distribution function

F(x) = l+z

Parameter restriction

B>o

Domain

-00

Mean
Mode

a!

I

a!

Skewness

0

Kurtosis

4.2

< x < +m

where z = exp

x-a!

--

B

Graphs
Examples of the logLaplace distribution are given below. 6 is just a scaling factor, giving the location of the point
of inflection of the density function. The 1ogLaplace distribution takes a variety of shapes, depending on the value
of B. For example, when /3 = 1, the logLaplace distribution is uniform for x < 6.

Uses
Kozubowski and Podg6rski (no date) review many uses of the logLaplace distribution. The most commonly quoted
use (for the symmetric loglaplace) has been for modelling "moral fortune", a state of well-being that is the
logarithm of income, based on a formula by Daniel Bernoulli.
The asymmetric IogLaplace distribution has been fitted to pharmacokinetic and particle size data (particle size
studies often show the log size to follow a tent-shaped distribution like the Laplace). It has been used to model
growth rates, stock prices, annual gross domestic production, interest and forex rates. Some explanation for the
goodness of fit of the logLaplace has been suggested because of its relationship to Brownian motion stopped at a
random exponential time.
Equations
Probability density function

a

f (x) = ---

aB
6
f (x) = --- s(a! + B ) x

for05x (6
for x 2 6

656

Risk Analysis

Parameter restriction

a>O,B>O,S>O

Domain

Otxc+oo

Mean

S

Mode

0
No unique mode for
6

a8
(a - 1)(B

Variance

s2

Skewness

1 f
V3f2 \ (a - 3)(8

Kurtosis

for a > 1

+ 1) ~p

forO1

for a > 2

+ 3) - 3(V + p 2 ) p + 2p3)

S4a8
(a-4)(8+4)

-4

63a,3
p
(a-3)(8+3)

for a > 3

+ 6(V + p 2 ) p 2- 3p4
for a > 4

Graphs
When log(X) takes a logistic distribution, then X takes a loglogistic distribution. Their parameters are related as
follows:

LogLogistic(a, 1) is the standard loglogistic distribution.

Appendix Ill A compendium of distribut~ons 657

Uses
The loglogistic distribution has the same relationship to the logistic distribution as the lognormal distribution has
to the normal distribution. If you feel that a variable is driven by some process that is the product of a number of
variables, then a natural distribution to use is the lognormal because of the central limit theorem. However, if one
or two of these factors could be dominant, or correlated, so that the distribution is less spread than a lognormal,
then the loglogistic may be an appropriate distribution to try.
Equations

I

Probability density function

I

Cumulative distribution function

a xu-'

fir) =

1

F () =

B a
;

I+
Parameter restriction

I

1 a>o,B>O
I

Domain
Mean

where Q =

3
, Q csc(8)
I

Mode

B
0

[%IG

n
a

-

1

for a > 1
for w 5 1

p2 Q[2csc(2Q) - Q csc2(Q)]

Variance
I

Skewness

3 csc(3Q) - 60 csc(2Q)csc(8)

for

+ 20'

csc3(Q)

3

a!

>2

for a =. 3

J8[2 C S C ( ~ Q
-)0 C S C ~ ( ~ ) ] Z
Kurtosis

6 Q2 csc3(0) sec(0)

+ 4 csc(4Q) - 3 Q3 csc4(Q) - 12 Q csc(Q)csc(3Q)
8 [2 csc(28) - Q csc2(Q)I2
for a > 4

658

R~skAnalysis

Lognormal
VoseLognormal(p, a)

Graphs

0

5

10

15

20

25

Uses

The lognormal distribution is useful for modelling naturally occurring variables that are the product of a number
of other naturally occurring variables. The central limit theorem shows that the product of a large number of
independent random variables is lognormally distributed. For example, the volume of gas in a petroleum reserve
is often lognormally distributed because it is the product of the area of the formation, its thickness, the formation
pressure, porosity and the gasliquid ratio.
Lognormal distributions often provide a good representation for a physical quantity that extend from zero to
+infinity and is positively skewed, perhaps because some central limit theorem type of process is determining
the variable's size. Lognormal distributions are also very useful for representing quantities that are thought of in
orders of magnitude. For example, if a variable can be estimated to within a factor of 2 or to within an order of
magnitude, the lognormal distribution is often a reasonable model.
Lognormal distributions have also been used to model lengths of words and sentences in a document, particle sizes
in aggregates, critical doses in pharmacy and incubation periods of infectious diseases, but one reason the lognormal
distribution appears so frequently is because it is easy to fit and test (one simply transforms the data to logs and
manipulates as a normal distribution), and so observing its use in your field does not necessarily mean it is a good
model: it may just have been a convenient one. Modem software and statistical techniques have removed much of
the need for assumptions of normality, so be cautious about using the lognormal because it has always been that way.
Equations

Probability density function

1
f (x) = ----

where p1 = In

-

p2

-

JFG?

and
=

,/m

Append~xIll A compendium of distr~butions 6 5 9

Cumulative distribution function
Parameter restriction

No closed form
a>O.u>O

Domain

x>O

Mean

P

Mode

exp(Pl - a:)
u

Variance

o

Skewness

-

/

Kurtosis

+3

P
Z'

o
-

P

+ t 3+ 3 ~ '

-

3

where

z

=1

+-

Graphs

Uses
Scientists often describe data in log form, usually log base 10. This distribution allows the user to specify the base
B so, for example:

Equations

where V = ( o In B)'
and rn = u In B

1

Cumulative distribution function

1

No closed form

Parameter restriction

u >O,p>O,B>O

Domain

x>O

1

660

Risk Analysis

T)

Mean
Mode

exp (rn +
exp(rn - V)

Variance

exp(2rn

+ V)(exp(V) 1)
(ex~(V+
) 2) d
m
exp(4V) + 2exp(3V) + 3 exp(2V)

Skewness
Kurtosis

-

-

3

Graphs
0.81

Uses
If a random variable X is such that ln[X] follows a normal distribution, then X follows a lognormal distribution.
We can specify X with the mean and standard deviation of the lognormal distribution (see the VoseLognormal
distribution above) or by the mean and standard deviation of the corresponding normal distribution, used here. So:

Equations
(lnx
Probability density function

f (x) =

Cumulative distribution function

xu&
No closed form

Parameter restriction

a>O,p>O

Domain

x>O

-

I

p12

Appendix Ill A compendium of distributions

66 1

+ a2
2

Mean

exp p

Mode
Variance

expb -a2)
e x p ( 2 ~ u2)(exp(a2) - 1)

-

+
1 (exp(u2) + 2)

Skewness

I exp(4u2) + 2 exp(3u2) + 3 exp(2u2) - 3

Kurtosis

Modified PERT
VoseModPERT(rnin, mode, max, y )
Graphs

David Vose developed a modification of the PERT distribution with minimum min, most likely mode and maximum
max to produce shapes with varying degrees of uncertainty for the min, mode, max values by changing the
assumption about the mean:
min

+ y mode + max = p
y+2

In the standard PERT, y = 4, which is the PERT network assumption that the best estimate of the duration of a
task = (min 4mode max)/6. However, if we increase the value of y , the distribution becomes progressively
more peaked and concentrated around mode (and therefore less uncertain). Conversely, if we decrease y the
distribution becomes flatter and more uncertain. The figure below illustrates the effect of three different values of
y for a modified PERT(5, 7, 10) distribution.

+

+

662

Risk Analysis

Uses
This modified PERT distribution can be very useful in modelling expert opinion. The expert is asked to estimate
three values (minimum, most likely and maximum). Then a set of modified PERT distributions are plotted, and the
expert is asked to select the shape that fits hisfher opinion most accurately.
Equations

(X - min)"l-'(max

-X

) ~ Z ~ '

f (XI = B(al, a2)(max - min)"f+a2P1

Probability density function

where

a1

=1

mode - min
+ y (rnaX
-min)

,a2

=1+y

max - mode
rnax - min

and B ( a l , a 2 ) is a beta function
B , ( ~ Ia;?)
,
= Iz(a1, a21
B(a1, a21 ,
x - nlln
where z =
rnax - min
and B z ( a l ,a 2 ) is an incomplete beta function
F(x) =

Cumulative distribution function

min < mode < max.

Parameter restriction

I
I

v >0

I

Mean

I min 0 where s is an integer
I

1

x = {0, 1 , 2 , .. .]

Domain
I

Mean
if z is an integer
otherwise

Mode
where
Variance

4 1

z

=

s(1 - p ) - 1
n

-P)
2

Skewness

2-P
Js(1 - p)

Kurtosis

6
p2
3+-+--s
s(1 - p )

I

Appendix Ill A compendium of distribut~ons 6 6 5

Normal
VoseNormal(@, a )
Graphs

Uses

1. Modelling a naturally occurring variable. The normal, or gaussian, distribution occurs in a wide variety of
applications owing, in part, to the central limit theorem which makes it a good approximation to many other
distributions.
It is frequently observed that variations in a naturally occurring variable are approximately normally distributed; for example: the height of adult European males, arm span, etc.
Population data tend approximately to fit to a normal curve, but the data usually have a little more density
in the tails.
2. Distribution of errors. A normal distribution is frequently used in statistical theory for the distribution of errors
(for example, in least-squares regression analysis).
3. Approximation of uncertainty distribution. A basic rule of thumb in statistics is that, the more data you have,
the more the uncertainty distribution of the estimated parameter approaches a normal. There are various ways
of looking at it: from a Bayesian perspective, a Taylor series expansion of the posterior density is helpful; from
a frequentist perspective, a central limit theorem argument is often appropriate: binomial example, Poisson
example.
4. Convenience distribution. The most common use of a normal distribution is simply for its convenience. For
example, to add normally distributed (uncorrelated and correlated) random variables, one combines the means
and variances in simple ways to obtain another normal distribution.
Classical statistics has grown up concentrating on the normal distribution, including trying to transform data so
that they look normal. The Student t-distribution and the chi-square distribution are based on a normal assumption.
It's the distribution we learn at college. But take care that, when you select a normal distribution, it is not simply

Graphs

Cumulative graph of the Ogive distribution.

Uses
The Ogive distribution is used to convert a set of data values into an empirical distribution (see Section 10.2.1).

Equations

Probability density function

1
1
xi 5 x < X i + l
f ( x ) = -.
for
n + l xi+, - x i
i ~ { 0 , 1...,
, n+l)
where xo = min, x,+l = max, Po = 0, Pn+l = 1

1
n+l

Cumulative distribution function

F(xi) =

Parameter restriction

xi < x i + l , n 2 0

Domain

min < x < max

Mean

PC+ 1 i=o

Mode

No unique mode

Variance

Complicated

Skewness

Complicated

Kurtosis

Complicated

1

"

xitl +xi

Pareto
VosePareto(0, a )
Graphs
The Pareto distribution has an exponential type of shape: right skewed where mode and minimum are equal. It
starts at a , and has a rate of decrease determined by 8 : the larger the 8 , the quicker its tail falls away. Examples
of the Pareto distribution are given below

668

Risk Analysis

Uses

1. Demographics. The Pareto distribution was originally used to model the number of people with an income of
at least x, but it is now used to model any variable that has a minimum, but also most likely, value and for
which the probability density decreases geometrically towards zero.

The Pareto distribution has also been used for city population sizes, occurrences of natural resources, stock
price fluctuations, size of companies, personal income and error clustering in communication circuits.
An obvious use of the Pareto is for insurance claims. Insurance policies are written so that it is not worth
claiming below a certain value a , and the probability of a claim greater than a is assumed to decrease as a
power function of the claim size. It turns out, however, that the Pareto distribution is generally a poor fit.

2. Long-tailed variable. The Pareto distribution has the longest tail of all probability distributions. Thus, while it is
not a good fit for the bulk of a variable like a claim size distribution, it is frequently used to model the tails by
splicing with another distribution like a lognormal. That way an insurance company is reasonably guaranteed to
have a fairly conservative interpretation of what the (obviously rarely seen, but potentially catastrophic) very high
claim values might be. It can also be used to model a longer-tailed discrete variable than any other distribution.

Probability density function

f(x) =

,
,
8ae

1

Cumulative distribution function

I

F(x) = I -

I

Parameter restriction

1

0>O,a>O

Domain

Mode
Variance
Skewness
Kurtosis

I

a j x

ea

Mean

I

(9"

1

for 0 z I

0-1
a

I

0a2
(0 - 1)2(e - 2)
0-3
3(0 - 2)(3e2 0 2)
o ( e - 3)(0 - 4)

+ +

for 0 > 2
for 0 > 3
for 0 > 4

Graphs
This distribution is simply a standard Pareto distribution but shifted along the x axis so that it starts at x = 0. This
is most readily apparent by studying the cumulative distribution functions for the two distributions:
Pareto

Appendix Ill A compendium of distributions

The only difference between the two equations is that x for the Pareto has been replaced by (x
In other words, using the notation above:

+ b) for the Pareto2.

where a = b and q = 0
Thus, both distributions have the same variance and shape when a = b and q = 0, but different means.

Uses

See the Pareto distribution.
Equations

f (XI = (X + b)qfl

Cumulative distribution function

b4
F(x) = 1- (X + hW

Parameter restriction

b>O,q>O

Domain

Opxp+oo
b
9 -1

Mean

1

q bq

Probability density function

Mode

for q > 1

10

I

Variance

bLq
(9 - 1)2(q - 2)

Skewness

2-

Kurtosis

3(q - 2)(3q2 q 2)
9 ( 9 - 3)(9 - 4)

q+l

q-2

q-3

q

-

+ +

669

for q > 2
for q > 3
for q > 4

670

Risk Analys~s

Graphs
The Pearson family of distributions was designed by Pearson between 1890 and 1895. It represents a system
whereby for every member the probability density function f ( x ) satisfies a differential equation:

where the shape of the distribution depends on the values of the parameters a , co, cl and c2. The Pearson type 5
corresponds to the case where co clx c2x2is a perfect square (c2 = 4coc2).Thus, equation ( 1 ) can be rewritten
as

+

+

Examples of the Pearson type 5 distribution are given below.

Uses

This distribution is very rarely used correctly in risk analysis.
Equations

1

Probability density function

f(x)=

e-BI~

-

Cumulative distribution function

j3r(a) (x/j3)"+'
No closed form

Parameter restriction

a>O,j3>0

Domain

O ~ x t + m

Append~xIll A compendium of distributions

Mean
Mode
Variance
Skewness
Kurtosis

P
a-1

671

for a > 1

B
a+l

b2
( a - 1)2(a-2)

43(a

for a > 2
for a > 3

+ 5)(a

-

2)

( a - 3 ) ( a - 4)

for a > 4

Graphs
The Pearson type 6 distribution corresponds in the Pearson system to the case when the roots of co
are real and of the same sign. If they are both negative, then

+ c l x + c2x2 = 0

Since the expected value is greater than a2, it is clear that the range of variation of x must be x > a2.
Examples of the Pearson type 6 distribution are given below.

672

Risk Analysis

Uses
At Vose Consulting we don't find much use for this distribution (other than to generate an F distribution). The
distribution is very unlikely to reflect any of the processes that the analyst may come across, but its three parameters
(giving it flexibility), sharp peak and long tail make it a possible candidate to be fitted to a very large (so you
know the pattern is real) dataset to which other distributions won't fit well.
Like the Pearson type 5 distribution, the Pearson type 6 distribution hasn't proven to be very useful in risk
analysis.
Equations
t(X) =

I

I

(xlB)"'-'
BB(al, a2) /. x \"lfu2

where B(a1, a?) is a beta function

Cumulative distribution function

No closed form

Parameter restriction

a1

Domain

O 0, a 2 > 0, /3 > 0

for a2 > 1

-1

B(a1 - 1)
a2 t 1
0
B2al@I

for a l > 1
otherwise

+

az - 1)
(a2 - 1)2(a2 - 2)

2 \ i v r 2 a l +a2- 1
+
a
a2-3
2(a2
- 112
3(a2 - 2 )

for

a2

>2

for a 2 > 3

for a, > 4

PERT
VosePERT(rnin, mode, max)
Graphs
The PERT (aka betaPERT) distribution gets its name because it uses the same assumption about the mean (see
below) as PERT networks (used in the past for project planning). It is a version of the beta distribution and
requires the same three parameters as the triangular distribution, namely minimum (a), most likely (b) and maximum (c). The figure below shows three PERT distributions whose shape can be compared with the triangular
distributions.

Appendix Ill A compendium of distributions

673

Uses

The PERT distribution is used exclusively for modelling expert estimates, where one is given the expert's minimum,
most likely and maximum guesses. It is a direct alternative to a triangular distribution.
Equations

Probability density function

(X - min)LY~-'
(max - ~ ) ~ 2 - '

f (x) = B ( a l , a2)(max - min)LYlfa2-1

[

].

[s]

p-min
.
a2 =6
rnax - m n
min 4mode rnax
with p ( = mean) =
6
and B(al, a z ) is a beta function
where a' = 6

+

Cumulative distribution function :

Bz(a1, a21
= I,(al, a21
az) ,
x - mln
where z =
rnax - min
and B,(al, a z ) is an incomplete beta function

f (x) =

Parameter restriction :

min < mode < max

Domain :

min 5 x 5 max
min + 4mode rnax
6
mode
( p - m n ) (max - p )

Mean :
Mode :
Variance :

+

+

7

=p

674

Risk Analysis

Skewness

+ +
+ +

(a1 + a 2 f 1)(2(a1 ad2 a l u 2 ( a l + a 2 - 6))
a ~ a z ( a l a 2 2)(al + a2+ 3)

Kurtosis :

Poisson
VosePoisson(ht)
Graphs

Uses
The Poisson(At) distribution models the number of occurrences of an event in a time t with an expected rate of A
events per period t when the time between successive events follows a Poisson process.
Example
If /3 is the mean time between events, as used by the exponential distribution, then A = 1//3. For example, imagine
that records show that a computer crashes on average once every 250 hours of operation (/3 = 250 hours), then
the rate of crashing A is 11250 crashes per hour. Thus, a Poisson(10001250) = Poisson(4) distribution models the
number of crashes that could occur in the next 1000 hours of operation.
Equations
Probability mass function

eCht(At)x
f(x) = x1

Cumulative distribution function

F ( x ) = e-"

-

LxJ (At)i
C
l !

i=o

Appendix Ill A compend~umof distribut~ons 675

I

Parameter restriction

At > 0

Domain

x = {0,1,2,. . . I

Mean

At

Mode

A t , At - 1 if At is an integer
lht 1
otherwise

Variance
Skewness
Kurtosis

I

I

At

1

3+-

1

Polya
VosePoly a(a, /3)

Graphs

Uses
There are several types of distribution in the literature that have been given the P6lya name. We employ the
name for a distribution that is very common in the insurance field. A standard initial assumption of the frequency
distribution of the number of claims is Poisson:
Claims = Poisson(h)

676

Risk Analysis

where h is the expected number of claims during the period of interest. The Poisson distribution has a mean and
variance equal to h, and one often sees historic claim frequencies with a variance greater than the mean so that
the Poisson model underestimates the level of randomness of claim numbers. A standard method to incorporate
greater variance is to assume that h is itself a random variable (and the claim frequency distribution is called a
mixed Poisson model). A Gamma(a, B) distribution is most commonly used to describe the random variation in h
between periods, so
Claims = Poisson(Gamma(a, B))

(1)

This is the P6lya(a, /3) distribution.
Comments

Relationship to the negative binomial

If a is an integer, we have
Claims = Poisson(Gamma(a, 8 ) ) = NegBin(a, 1/(1

+ B))

so one can say that the negative binomial distribution is a special case of the Pblya.
Equations

r (a + x)px

Probability mass function

= T(X
x

+ 1)r(a)(i +~ ) a + x
r ( a + i)Bi

Cumulative distribution function

F(x) =

Parameter restriction

a>O,/3>0

Domain

x = (0, 1 , 2 , . . .)

- Mean

,=o r ( i

+ l ) r ( a ) ( l + PIafi

aB

Mode

0
z , z+l

ifas1
if z is an integer
if z is not an integer
T Z ~
where z = /3 (a- 1) - 1

Variance

4 ( 1 + B)

Skewness

1 +2/3
2/(1 B)aB

Kurtosis

6
1
'+;+as(l+B)

+

Appendix Ill A compendium of distribut~ons 677

Rayleigh
VoseRayleigh(b)

Graphs
Originally derived by Lord Rayleigh (or by his less glamorous name J. W. Strutt) in the field of acoustics.
The graph below shows various Rayleigh distributions. The distribution in solid line is a Rayleigh(l), sometimes
referred to as the standard Rayleigh distribution.

Uses
The Rayleigh distribution is frequently used to model wave heights in oceanography, and in communication theory
to describe hourly median and instantaneous peak power of received radio signals.
The distance from one individual to its nearest neighbour when the spatial pattern is generated by a Poisson
distribution follows a Rayleigh distribution.
The Rayleigh distribution is a special case of the Weibull distribution since Rayleigh(b) = Weibull(2, b J2), and
as such is a suitable distribution for modelling the lifetime of a device that has a linearly increasing instantaneous
failure rate: z(x) = x/b2.
Equations

I

X

Probability density function

f (x) = - exp
b2

Cumulative distribution function

F (x) = 1 - exp

Parameter restriction
Domain

I

b>O
O l x

Mean
Mode

I b

t

+m

I :cx,'l
--

I

678

Risk Analysis

n)

Variance

b2 (2 -

Skewness

2(rr - 3)J;;
(4 -

Kurtosis

32 - 37r2
(4 - n ) 2

%

%

0.63 11

3.2451

Reciprocal
VoseReciprocal(min, max)
Graphs

Uses
The reciprocal distribution is widely used as an uninformed prior distribution in Bayesian inference for scale
parameters.
It is also used to describe "11f noise". One way to characterise different noise sources is to consider the spectral
density, i.e. the mean squared fluctuation at any particular frequency f and how that varies with frequency. A
common approach is to model spectral densities that vary as powers of inverse frequency: the power spectrum
P ( f ) is proportional to f for p > 0. p = 0 equates to white noise (i.e. no relationship between P ( f ) and f ) ,
3, = 2 is called Brownian noise and /3 = 1 takes the name "11f noise" which occurs very often in nature.
Equations
Probability density function

f (x)

Cumulative distribution function

F(x) =

Parameter restriction

= i w h e r e q = log(mar) - log(min)
xq

og(x) - log(m1n)

0 < min < max

9

Appendix Ill A compendium of distributions

Domain
Mean
Mode
Variance

Skewness

679

min 5 x 5 max
max - min
min
(max - min)[max(q - 2 ) min(q 2)]
2q2
i ~ ~ ( m a x - m i+
n)~
Lq2~rnaxzt2q-~)
+ iminmaxq + min2(L+9r)

+

3qJmax

- min(max(q

- 2)

+

+ min(q + 2))3/2

Kurtosis

Relative
VoseRelative(min, max, {xi},{pi})

Graphs
The relative distribution is a non-parametric distribution (i.e. there is no underlying probability model) where { x i }
is an array of x values with probability densities { p i ] and where the distribution falls between the minimum and
maximum. An example of the relative distribution is given below.

Uses
1 . Modelling expert opinion. The relative distribution is very useful for producing a fairly detailed distribution that
reflects an expert's opinion. The relative distribution is the most flexible of all of the continuous distribution
functions. It enables the analyst and expert to tailor the shape of the distribution to reflect, as closely as possible,
the opinion of the expert.
2. Modelling posterior distribution in Bayesian inference. If you use the construction method of obtaining a
Bayesian posterior distribution, you will have two arrays: a set of possible values in ascending order, and

Risk Analysis

680

a set of posterior weights for each of those values. This exactly matches the input parameters for a relative
distribution which can then be used to generate values from the posterior distribution. Examples: turbine blades;
fitting a Weibull distribution.
Equations
Probability density
function
Cumulative distribution
function
n

Parameter restriction

pi 1 0 , xi < xi+l, n > 0,

x pi > 0

i=l

I

Domain

min 5 x 5 max

Mean

No closed form

Mode

I

I

No closed form

Variance

No closed form

Skewness

No closed form

Kurtosis

No closed form

Slash
VoseSlash(q)

Graphs

Appendix Ill A compendium of d~stributions 68 1

Uses
The (standard) slash distribution is defined as the ratio of a standard normal and an exponentiated uniform(0,l)
distribution:

The slash distribution may be used for perturbations on a time series in place of a more standard normal distribution
assumption, because it has longer tails. The q parameter controls the extent of the tails. As q approaches infinity,
the distribution looks more and more like a normal.

Equations

1
1

f (*I

Probability density function

=9

uq,(ux)du

0

where @(x) is the standard normal pdf

/

1

F(x) = q

Cumulative distribution function

uq-' ~ ( r u ) d u

0

where @(x) is the standard normal cdf
Parameter restriction

I

Domain

q>o

I

1

-oo 1

for q > 2

R~skAnalys~s

682

SplitTriangle
VoseSplitTriangle(low, medium, high, lowP, mediumP, highP)
Graphs

Uses
The split triangle is used to model expert opinion (see Chapter 14) where the subject matter expert (SME) is asked
for three points on the distribution: for each point the SME provides a value of the variable and his or her view
of the probability of it being less than that value. The split triangle then extrapolates from these values to create a
distribution composed of two triangles as in the figure above.
Equat~ons
Probability density function:

Height1 (x - min)
(mode - min)
Height2 (max - x)
=
(max - mode)
2 * mediumP
where Height1 =
mode - rnin
2 * (1 - mediurnP)
Height2 =
max -mode
(low - mode*z/lowP/mediumP)
and min =
(1 - z/lowP/mediumP)
mode = medium
f (x) =

max =

(mode - high*J1 - mediumP11 - highP)
(1 - J1 - mediumP11 - highP)

if min 5 x 5 mode
if mode < x 5 max

Appendix Ill A compendium of distribut~ons 683

if x < min

Cumulative. distribution function:
Height1 (x - min12
2(mode - min)
Height2(max - x ) ~
F(x) = 1 2(max - mode)
F(x) = 1

if min

F(x) =

1

Parameter restriction:
Domain:
Mean:
Mode:

Skewness:
Kurtosis:

min < x < max
mediumP (1 - mediumP) max +2
3
mode

+

if rnax < x

1
*

mode

mediumP * min + 2 * mode * mediumP * rnax 2 * mode * rnax - 2 * mediump2 * min2 + max2
18
-2 * mediumP2 * max2+4 * mediumP2 * rnax * min+s * mediumP * min2+
mediumP * max2-4 * min * mediumP * rnax
I
18
Complicated
-

2 * mode*

I Complicated
Step Uniform
VoseStepUniform(rnin, max, step)

Graph

if mode < x 5 rnax

I min 5 mode 5 max, min < max

mode2

Variance:

( x ( mode

1

684

Risk Analys~s

Uses

The step uniform distribution returns values between the min and rnax at the defined step increments. If the step
value is omitted, the function assumes a default value of 1.
The step uniform function is generally used as a tool to sample in a controlled way along some dimension (e.g.
time, distance, x ) and can be used to perform simple one-dimensional numerical integrations. By setting the step
value to give a specific number of allowed values n
Step =

rnax - rnin
n-1

and using Latin hypercube sampling with n iterations, we can ensure that each allowed value is sampled just once
to get a more accurate integral.
StepUniform(A, B), where A and B are integers, will generate a random integer variable that can be used as an
index variable as an efficient way to select paired data randomly from a database.
Equations

step

f (x) = rnax - rnin + step

for x = rnin
i=Oto

+ i . step,

rnax - rnin

otherwise

1

Cumulative distribution function

/

F(x) = 0

for x < min
step
rnax - min

I
Parameter restriction

I

1

I

Domain

I

Mean

I

Mode
Variance

I

Skewness
Kurtosis

I

+ step

rnax - min
must be an integer
step
min < x < max
rnin rnax

+

I

for rnin 5 x < rnax
for max < x

I

1

II

Not uniauelv defined
(max - min) (max - min

+ 2step)

1

1I
'

I

10

+

3 3(max - min12 2step(3max - 3 rnin - 2step)
5
(max - min)(max - min + 2step)

-

Appendix Ill A compendium of distribut~ons 685

Student
VoseStudent(v)
Graphs

Uses

-8

-6

-4

-2

0

2

4

6

8

The most common use of the Student distribution is for the estimation of the mean of a (n assumed normally
distributed) population where random samples from that population have been observed, and its standard deviation
is unknown. The relationship

is at the centre of the method. This is equivalent to a t-test in classical statistics.
Other sample statistics can be approximated to a Student distribution, and thus a Student distribution can be
used to describe one's uncertainty about the parameter's true value, in regression analysis, for example.
Comments

First discovered by the English statistician William Sealy Gossett (1876-1937), whose employer (the brewery
company Guinness) forbade employees from publishing their work, so he wrote a paper under the pseudonym
"Student". As v increases, the Student t-distribution tends to a Normal(0, 1) distribution.
Equations

686

R~skAnalysis

Parameter restriction

v is a positive integer

Domain

-m < x < +m

Mean

I

for v > 1

Mode
Variance

I

10

Skewness
Kurtosis

V

for v > 2

-

v-2
10

for v > 3

v-2

3 v-4

/

for v > 4

Triangle
VoseTriangle(rnin, mode, max)

Graphs
The triangular distribution constructs a triangular shape from the three input parameters. Examples of the triangular
distribution are given below.

Uses
The triangular distribution is used as a rough modelling tool where the range (min to max) and the most likely
value within the range (mode) can be estimated. It has no theoretical basis but derives its statistical properties from
its geometry.
The triangular distribution offers considerable flexibility in its shape, coupled with the intuitive nature of its
defining parameters and speed of use. It has therefore achieved a great deal of popularity among risk analysts.
However, min and max are the absolute minimum and maximum estimated values for the variable, and it is
generally a difficult task to make estimates of these values.

Appendix Ill A compendium of distributions

687

Equations

Probability density function

2(x - min)
f (x) = (mode - min) (max - min)

if min 4 x 5 mode

f (x) = (max - min) (max - mode)

if mode < x 5 max

2(max - x)

F(x) = 0

Cumulative distribution function

if x < min

(x - rnin12
F(x) =
(mode - min) (max - min)
(max - x ) ~
F(x) = 1(max - min) (max - mode)
F(x) = 1
Parameter restriction

min 5 mode 5 max, min < rnax

Domain

rnin < x < rnax
rnin + mode rnax
3
mode

Mode
Variance
Skewness
Kurtosis

if mode < x 5 rnax
if max < x

+

Mean

I

if rnin 5 x 5 mode

I

min2

I

+ mode2 + max2 - rnin mode - min rnax - mode rnax
18

22/2 z(z2 - 9)
5 (z2 3)3/2
2.4

+

where z =

2(mode - min)
-1
max - min

Uniform
VoseUnifonn(min, max)

Graphs
A uniform distribution assigns equal probability to all values between its minimum and maximum. Examples of
the uniform distribution are given below.
VoseUniform(l,6)

688

R~skAnalysis

Uses

1. Rough estimate. The uniform distribution is used as a very approximate model where there are very little or no
available data. It is rarely a good reflection of the perceived uncertainty of a parameter since all values within
the allowed range have the same constant probability density, but that density abruptly changes to zero at the
minimum and maximum. However, it is sometimes useful for bringing attention to the fact that a parameter is
very poorly known.
2. Crude sensitivity analysis. Sometimes we want to get a rough feel for whether it is important to assign uncertainty to a parameter. You could give the parameter a uniform distribution with reasonably wide bounds, run
a crude sensitivity analysis and see whether the parameter registered as having influence on the output uncertainty: if not, it may as well be left crudely estimated. The uniform distribution assigns the most (reasonable)
uncertainty to the parameter, so, if the output is insensitive to the parameter with a uniform, it will be even
more insensitive for another distribution.
3. Rare uniform variable. There are some special circumstances where a uniform distribution may be appropriate,
for example a Uniform(0, 360) distribution for the angular resting position of a camshaft after spinning, or
a Uniform(0, Ll2) for the distance from a random leak in a pipeline of segments of length L to its nearest
segment end (where you'd break the pipeline to get access inside).
4. Plotting a function. Sometimes you might have a complicated function you wish to plot for different values of an input parameter, or parameters. For a one-parameter function (like y = GAMMALN(ABS(SIN(x)l
((x - l)"0.2 COS(LN(3 * x))))), for example), you can make two arrays: the first with the x values (say
between 1 and 1000), and the second with the correspondingly calculated y values. Alternatively, you could
write one cell for x (= Uniform(1, 1000)) and another for y using the generated x value, name both as outputs,
run a simulation and export the generated values into a spreadsheet. Perhaps not worth the effort for one
parameter, but when you have two or three parameters it is. Graphic software like S-PLUS will draw surface
contours for {x, y, z ] data arrays.
5 . Uninformedprior. A uniform distribution is often used as an uninformed prior in Bayesian inference.

+

Equations

1

Probability density function

f (x) = rnax - min
x-min
rnax - min
min i max
F(x) =

Cumulative distribution function
Parameter restriction

min < x < max
min rnax

Domain

+

Mean
Mode

I

No uniaue mode

Skewness

(max - min12
12
0

Kurtosis

1.8

Variance

Appendix Ill A compendium of d~stributions 689

Weibull
VoseWeibull(a, B)
Graphs

Uses

The Weibull distribution is often used to model the time until occurrence of an event where the probability of
occurrence changes with time (the process has "memory"), as opposed to the exponential distribution where the
probability of occurrence remains constant ("memoryless"). It has also been used to model variation in wind speed
at a specific site.
Comments

The Weibull distribution becomes an exponential distribution when a! = 1, i.e. Weibull(1, B) = Exponential(B).
The Weibull distribution is very close to the normal distribution when B = 3.25. The Weibull distribution is named
after the Swedish physicist Dr E. H. Wallodi Weibull (1887-1979) who used it to model the distribution of the
breaking strengths of materials.
Equations
X

Probability density function

f (x) = a!B-axOL-l exp - -

Cumulative distribution function

F(x) = 1 - exp

Parameter restriction

a!>O,B>O

Domain

-00

Mean

B
-r
a!

B

< x < +oo

1
a

-

a

Mode
-

Variance

a

Skewness

i

4r

Kurtosis

(

-

+"r2

a2

:
( ( a (a)
(-!) (f) f (t)

- -r

-

r

r

-

-

-

-r2

r4

1

111.7.2 Multivariate Distributions

Dirichlet
VoseDirichlet((a))

Uses
1

The Dirichlet distribution is the multivariate generalisation of the beta distribution. {VoseDirichlet) is input in Excel
as an array function.
It is used in modelling probabilities, prevalence or fractions where there are multiple states to consider. It is the
multinomial extension to the beta distribution for a binomial process.

Example I
You have the results of a survey conducted in the premises of a retail outlet. The age and sex of 500 randomly
selected shoppers were recorded:
<25 years, male:
25 to t 4 0 years, male:
>40 years, male:
t 2 5 years, female:
25 to <40 years, female:
>40 years, female:

38 people
72 people
134 people
57 people
126 people
73 people

In a manner analogous to the beta distribution, by adding 1 to each number of observations we can estimate the
fraction of all shoppers to this store that are in each category as follows:

The Dirichlet then returns the uncertainty about the fraction of all shoppers that are in each group

!
f

Example 2

A review of 1000 companies that were S&P AAA rated last year in your sector shows their rating 1 year later:
AAA:
AA:
A:
BBB or below:

908
83
7
2

If we assume that the market has similar volatilities to last year, we can estimate the probability that a company
rated AAA now will be in each state next year as

The Dirichlet then returns the uncertainty about these probabilities.
Comments

The Dirichlet distribution is named after Johann Peter Gustav Lejeune Dirichlet. It is the conjugate to the multinomial
distribution. The first value of a {Dirichlet(al,a2)) = Beta(al, a2).
Equations

The probability density function of the Dirichlet distribution of order K is

where x is a K-dimensional vector x = (xl, xz, . . . , xK), a = ( a l , . . . , a K ) is a parameter vector and B(a) is the
multinomial beta function

k

Parameter restrictions: a; > 0. Domain: 0 5 xi 5 1;

i=l

xi = 1

Inverse Multivariate Hypergeometric
VoseInvMultiHypereo({s], { d ) )
The inverse multivariate hypergeometric distribution answers the question: How many extra (wasted) random
multivariate hypergeometric samples will occur before the required numbers of successes (s) are selected from
each subpopulation ID)?
For example, imagine that our population is split up into four subgroups {A, B, C, D) of sizes (20, 30, 50,
10) and that we are going randomly to sample from this population until we have {4, 5, 2, 1) of each subgroup
respectively. The number of extra samples we will have to make is modelled as

692

Risk Analys~s

The total number of trials that need to be performed is

The inverse multivariate hypergeometric 2 is a multivariate distribution that responds to the same question but
breaks down the number of extra samples into their subgroups. This is a univariate distribution, though belonging
to a multivariate process. I have placed it in the multivariate section for easier comparison with the following
distribution.

Inverse Multivariate Hypergeometric 2
VoseInvMultiHypergeo2(Is), { D))
The second inverse multivariate hypergeometric distribution answers the question: How many extra (wasted) random multivariate hypergeometric samples will be drawn from each subpopulation before the required numbers of
successes {s] are selected from each subpopulation {Dl.
For example, imagine that our population is split up into four subgroups (A, B, C, D) of sizes (20, 30, 50,
10) and that we are going randomly to sample from this population until we have (4, 5, 2, 1) of each subgroup
respectively. The number of extra samples we will have to make for each subpopulation A to D is modelled as the
array function:

Note that at least one category must be zero, since once the last category to be filled has the required number of
samples the sampling stops, so for that category at least there will be no extra samples.
The inverse multivariate hypergeometric 2 responds to the same question as the multivariate hypergeometric
distribution but breaks down the number of extra samples into their subgroups, whereas the inverse multivariate
hypergeometric simply returns the total number of extra samples.

Multivariate Hypergeometric
VoseMultiHypergeo(N, (DJ 1)
The multivariate hypergeometric distribution is an extension of the hypergeometric distribution where more than
two different states of individuals in a group exist.
3

Example

d

In a group of 50 people, of whom 20 were male, a VoseHypergeo(l0, 20, 50) would describe how many from 10
randomly chosen people would be male (and by deduction how many would therefore be female). However, let's
say we have a group of 10 people as follows:

9
f

!

1

I German I English I French [ Canadian /
Now, let's take a sample of four people at random from this group. We could have various numbers of each
nationality in our sample:

Appendix Ill A compendium of distributions

693

and each combination has a certain probability. The multivariate hypergeometric distribution is an array distribution,
in this case generating simultaneously four numbers, that returns how many individuals in the random sample came
from each subgroup (e.g. German, English, French and Canadian).

Generation
The multivariate hypergeometric distribution is created by extending the mathematics of the hypergeometric
distribution. The hypergeometric distribution models the number of individuals s in a sample of size n (that are
randomly sampled from a population of size M) that come from a subgroup of size D (and therefore (n - s) will
have come from the remaining number subgroup (M - D)) as shown in the following figure:

M (total population)

ID

(infected)

/

w
n (selected)

s = number infected from selection
resulting in the probability for s:

The numerator is the number of different sampling combinations (each of which has the same probability because
each individual has the same probability of being sampled) where one would have exactly s from the subgroup D

694

Risk Analvsis

(and by implication (n - s) from the subgroup ( M - D). The denominator is the total number of different combinations one could have in selecting n individuals from a group of size M . Thus, the equation is just the proportion
of different possible scenarios, each of which has the same probability, that would give us s from D.
The multivariate hypergeometric probability equation is just an extension of this idea. The figure below is a
graphical representation of the multivariate hypergeometric process, where D l , D 2 , D3 and so on are the number
of individuals of different types in a population, and X I , x z , xs, . . . are the number of successes (the number of
individuals in our random sample (circled) belonging to each category).

and results in the probability distribution for ( x ) :

where

ED.= M and z x i = n
Equations

J

where M =

Di
i=l

Parameter restriction

0 < n 5 M , n , M and Di are integers

Domain

max(0, n

+ Di - M) 5 xi 5 min(n, D i )

Multinomial
VoseMultinomial(N, { p})
The multinomial distribution is an array distribution and is used to describe how many independent trials will
fall into each of several categories where the probability of falling into any one category is constant for all trials.
As such, it is an extension of the binomial distribution where there are only two possible outcomes ("successes"
and, by implication, "failures").

Appendix Ill A compendium of distr~butions 6 9 5

Uses

For example, consider the action people might take on entering a shop:
Code
A1
A2
A3
A4
A5

Action
Probability
Enter and leave without purchase or sample merchandise 32%
Enter and leave with a purchase
41 %
Enter and leave with sample merchandise
21%
Enter to return a product and leave without purchase
5%
Enter
to
return
a
product
and
leave
with
a
purchase
1OO/
- ---- -- --- -

--

If 1000 people enter a shop, how many will match each of the above actions?
The answer is {Multinomial(1000, (32 %, 41 %, 21 %, 5 %, 1 %})} which is an array function that generates five
separate values. The sum of those five values must, of course, always add up to the number of trials (1000 in this
example).
Equations
f ({x,, xz, . . . , xj)) =

Probability mass function

1

Parameter restrictions
Domain

/

pi

n!
p-;' . . . pxJ
x ~ !...Xj!
J

> O. n t (1.2.3. . . .],

5 pi = I

xi = {0, 1 , 2 , .. . , n}

Multivariate Normal

VoseM~ltiNormal({p~
1, {covmamX])

Uses
A multinormal distribution, also sometimes called a multivariate normal distribution, is a specific probability distribution that can be thought of as a generalisation to higher dimensions of the one-dimensional normal distribution.
Equations

The probability density function of the multinormal distribution is the following function of an N-dimensional
vectorx = ( x , , . . . , x ~ ) :

where p = ( p I , . . . , p N ) , C is the covariance matrix (N x N) and JCIis the determinant of C.
Parameter restrictions: C must be a symmetric, positive, semi-definite matrix.

Negative Multinomial
VoseNegMultinornial((sJ, { p ) )
The negative mu1tinomial distribution is a generalisation of the negative binomial distribution. The NegBin(s, p )
distribution estimates the total number of binomial trials that are failures before s successes are achieved where
there is a probability p of success with each trial.

696

Risk Analysis

For the negative multinomial distribution, instead of having a single value for s, we now have a set of success
values (s) representing different "states" of successes (si)one can have, with each "state" i having a probability
pi of success. This is a univariate distribution, though belonging to a multivariate process. I have placed it in the
multivariate section for easier comparison with the following distribution.
Now, the negative multinomial distribution tells us how many failures we will have before we have achieved
the total number of successes

Example

Suppose you want to do a telephone survey about a certain product you made by calling people you pick randomly
out of the phone book.
You want to make sure that at the end of the week you have called 50 people who have never heard of your
product, 50 people who don't have the Internet at home and 200 people who use the Internet almost daily.
If you know the probabilities of success pi, the NegMultinomial((SO,50,200],( p l , p2, p3}) will tell you how
many failures you'll have before you've called all the people you wanted and so you also know the total number
of phone calls you'll have to make to reach the people you wanted.
The total number of phone calls = the total number of successes (300) the total number of failures (NegMultinomial ((50,50,2001, { P I ,pz, ~ 3 1 ) ) .

+

Negative Multinomial 2
VoseNegMultinomial2({s), { p } )
The negative multinomial 2 is the same distribution as the negative multinomial, but, instead of giving you the
global number of failures before reaching a certain number of successes, the negative multinomial 2 gives you the
number of failures in each "group" or "state".
So, in the example of the telephone survey (see the negative multinomial) where the total number of phone calls
was equal to the total number of successes plus the total number of failures, the total number of failures would
now be a sum of the number of failures in each group (three groups in the example).

111.8 Introduction to Creating Your Own Distributions
Risk analysis software offer a wide variety of probability distributions that cover most common risk analysis
problems. However, one might occasionally wish to generate one's own probability distribution. There are several
reasons why one would prefer to make one's own distributions, and four main ways to do this. The choice of
which method to use will be determined by the following criteria:
Do you know the cdf or the pdf of the continuous distribution, or the pmf of the discrete distribution? If "yes",
try method 1.
Do you know a relationship to another distribution provided by your software? If "yes", try method 2.
Do you have data from which you wish to construct an empirical distribution? If "yes", try method 3.
Do you have points on a curve you want to convert to a distribution? If "yes", try method 4.
In addition to creating your own distributions, sometimes it is very useful to approximate a distribution with
another one, which is discussed in the following section.

A~pendlxIll A compendium of distributions

697

111.8.1 Method 1 : Generating Your Own Distribution When You Know the
cdf, pdf or pmf
Situation
You wish to use a parametric probability distribution that is not provided by your software, and you know:
the cumulative distribution function (continuous variable);
the probability density function (continuous variable); or
the probability mass function (discrete variable).
This section describes the technique for each situation.
Known cumulative distribution function (cdf)
This method applies when you know the cdf of a continuous probability distribution. The algebraic equation of the
cdf can often be inverted to make x the subject of the equation. For example, the cdf of the exponential probability
distribution is

where 9, is the mean of the exponential distribution. Inverting equation (111.4) to make x its subject gives

A random sample from any continuous probability distribution has equal probability of lying in any equally
sized range of F (x) between 0 and 1. For example, the variable X has a 10 % probability of producing a value
between x l and x2 (x2 > xl), where F(x2) - F ( x l ) = 0.1. Looking at it the other way round, F ( x ) can be thought
of as being a Uniform(0, 1) random variable. Thus, we can write Equation (111.5) as an Excel formula to generate
values from an exponential distribution with a mean of 23 as follows:

This is exactly equivalent to writing

in a cell. If we use Latin hypercube sampling (LHS) to draw from the Uniform(0, 1) distribution, we will get
Latin hypercube samples from the constructed exponential distribution too. Similarly, if we force the uniform
distribution to sample from its lower end - a technique necessary for advanced sensitivity analysis and stress
analysis, for example, Equation (111.6) will generate values from the lower end of the exponential distribution.
There is a disadvantage: the method hinges on being able to perform the algebra that inverts the cdf. If that
proves too difficult, it may be easier to construct the cdf with a cumulative distribution.
Known probability density function (pdf)
We may sometimes start with the pdf for a variable, determine the cdf by integration and then use the first method,
although this generally requires good maths skills. Another reason one might like to determine the cdf could be an
interest in the probability of the variable being below, between or above certain values. Integration of the pdf will
give the cdf that can give you these probabilities. The example below shows you how to determine a cdf, starting
with a pdf, in this case of a sine curve distribution.
Imagine that you want to design a distribution that follows the shape of a sine curve from 0 to a , where a is an
input to the distribution. This distribution shape is shown in Figure 111.3.

698

Risk Analys~s

x-range

a
Figure 111.3 The sine curve distribution we wish to create in our example.

The probability density function f (x) is given by

f (x) = b sin

nx
(_)

where b is a constant to be determined, so that the area under the curve equals 1, as required for a probability
distribution.
The cdf F ( x ) is then (0 < x < a )

For the area under the curve to equal 1, b must be determined such that F ( a ) = 1, i.e.

Therefore,

and, from Equation (III.7), F(x) becomes

We now need to find the inverse function to F(x). So, rearranging the equation above for x

To generate this distribution, we put a Uniform(0, 1) distribution in cell A1 (say), the value for a in cell B1 (say)
and, in the cell that generates values of x, we write
= Bl/PI()

* ACOS(1 - 2 * Al)(The Excel function ACOS(y) returns cos-'

(y))

If we use Latin hypercube sampling to generate the values from the Uniform(0, 1) distribution, we will generate
a smoothly distributed set of x values that will replicate the desired distribution.

Appendix Ill A compendium of distributions

699

Known probability mass function (pmf)

If you know the pmf of a distribution, it is a simple matter (in principle) to create the distribution. The techniques
above are not applicable because a cdf for a discrete variable is just the sum of the discrete probabilities, and it is
thus not possible to construct an inverse transformation.
The method requires that you construct two arrays in Excel:
The first array is a set of values that the variable might take, e.g. {0, 1, . . . ,99, 100).
The second array is a set of probabilities using the pmf calculated for each of these values {p(O),p ( l ) , . . . ,
~ ( 9 9 1~, ( 1 0 0 ) I .
These two arrays are then used to construct the required distribution. For example, using ModelRisk's discrete
distribution

Of course, this method can become cumbersome if the {x] array is very large. In which case:
Make the {x} list with a spacing larger than 1, e.g. (0,5, 10, . . . ,495,5001.
Calculate the associated probabilities {p(O), p(5), . . . , p(495), ~ ( 5 0 0 ) ) .
Construct a VoseRelative(min, max, (x}, (p(x)]) distribution, e.g. VoseRelative(-0.5, 500.5, (5, 10, . . . ,490,
4951, ( ~ ( 5 1~, ( 1 0 1.,. . , ~ ( 4 9 0 1~(49511).
,
Wrap a ROUND function around the relative: = ROUND(VoseRelative(. . .), 0) to return a discrete value.
Note that using a minimum = -0.5 (min = -0.5) and a maximum 500.5 (max = 500.5) will allocate a more
accurate probability to the end values.

111.8.2 Method 2: Using a known relationship with another distribution
Sometimes you will know a direct relationship between the distribution you wish to simulate and another provided
by your Monte Carlo simulation software. For example, one parameterization of the LogWeibull distribution gives
the following simple relationship:

If you tend to use sensitivity analysis tools (Sections 5.3.7 and 5.3.8), be careful that the two distributions increase
or decrease together.
There are plenty of examples where a distribution is a mixture of two other distributions. For example:
Poisson-Lognormal(w, a ) = Poisson(Lognormal(p, a ) )

Many of these relationships are described earlier in this Appendix. Again, be aware that by employing sensitivity
analysis with a mixture distribution you can get misleading results: so, for example, the Lognormal, Gamma and
Beta distributions in the relationships above will generate values that the Monte Carlo software will think of as
belonging to a separate variable.

700

Risk Analysis

111.8.3 Method 3: Constructing an Empirical Distribution from Data
Situation

You have a set of random and representative observations of a single model variable, for example the number of
children in American families (we'll look at a joint distribution for two or more variables at the end of this section),
and you have enough observations to feel that the range and approximate random pattern have been captured. You
want to use the data to construct a distribution directly.
Technique

It is unnecessary to fit a distribution to the data: instead one can simply use the empirical distribution of the data (if
there are no physical or biological reasons a certain distribution should be used, we generally prefer an empirical
distribution). Below, we outline three options you have to use these data to construct an empirical distribution:
1. A discrete uniform distribution - uses only the list of observed values.
2. A cumulative distribution - allows values between those observed and values beyond the observed range.
3. A histogram distribution - when you have huge amounts of data.
Option I : A discrete uniform distribution

A discrete uniform distribution takes one parameter: a list of values. It then randomly picks any one of those
values with equal probability (sampling with replacement). Thus, for example, = VoseDUniform((l,4,5,7, 10))
will generate, with each iteration, one of the five values 1, 4, 5, 7 or 10 (each value has during each iteration a
probably of being picked of 20 %). Figure 111.4 shows what the probability distribution looks like.
Let's imagine that we have our data in an array of cells called "observations". By simply writing = VoseDUniform
(Observations) we will generate a distribution that replicates the pattern of the observed data. You can use the
discrete uniform distribution for both discrete and continuous data providing you have sufficient observations.
Option 2: A cumulative distribution

If your data are continuous you also have the option of using a cumulative ascending distribution that takes four
parameters: a minimum, a maximum, a list of values and a list of cumulative probabilities associated with those
values. From these parameters it then constructs an empirical cumulative distribution by straight-line interpolation

Figure 111.4 Example of a discrete uniform distribution.

Appendix Ill A compendium of distributions

70 1

between the points defined on the curve. In ModelRisk there are two forms of the cumulative distribution: the
cumulative ascending distribution and the cumulative descending distribution.
Our best guess of the cumulative probability of a data point in a set of observations turns out to be rl(n + I),
where r is the rank of the data point within the dataset and n is the number of observations. Thus, when choosing
this option, one needs to:
rank the observations in ascending or descending order (Excel has an icon that makes this simple);
in a neighbouring column, calculate the rank of the data: write a column of values 1, 2, . . ., n;
in the next column, calculate the cumulative probability F(x) = rank/@ + 1);
use the data and F ( x ) columns as inputs to the VoseCumulA or VoseCumulD distribution, together with
subjective estimates of what the minimum and maximum values might be.
Note that the minimum and maximum values only have any effect on the very first and last interpolating lines
to create the cumulative distribution, and so the distribution is less and less sensitive to the values chosen as more
data are used in its construction.
Option 3: A Histogram Distribution

Sometimes (admittedly, not as often as we'd like) we have enormous amounts of random observations from
which we would like to construct a distribution (for example, the generated values from another simulation).
The discrete uniform and cumulative options above start to get a bit slow at that point, and model the variable
in unnecessarily fine details. A more practical approach now is to create a histogram of the data and use that
instead. The array function FREQUENCY() in Excel will analyse a dataset and say how many lie with any
number of contiguous bin ranges. The ModelRisk distribution VoseHistogram has three parameters: the minimum
possible value, the maximum possible value and an array of bin frequencies (or probabilities), which is just the
FREQUENCY() array.
Option 4: Creating an empirical joint distribution for two or more variables

For data that are collected in sets (pairs, triplets, etc.) there may be correlation patterns inherent in the observations
that we would like to maintain while fitting empirical distributions to data. An example is data of people's weight and
height, where there is clearly some relationship between them. A combination of using a step uniform distribution
(with min = 1 and max = number of rows) with an Excel VLOOKUPO (slower) or OFFSET() (faster) function
allows us to do this easily, as shown in the model in Figure 111.5.

111.8.4 Method 4: Create a Distribution from a Set of Points on a Curve
Situation

We have a set of coordinates that we wish to use to construct a distribution:
(x, f (x)] for a continuous distribution, where f (x) is (or is proportional to) the probability density at value x ;
{ x , F(x)) for a continuous distribution, where F ( x ) is the cumulative probability ( P ( X <= x)) at value x ; or
(x, p(x)} for a discrete distribution, where p(x) is (or is proportional to) the probability of value x.
Uses

There are many uses of this technique. For example:
converting the results of a constructed Bayesian inference calculation into a distribution;

702

R~skAnalysis

[sample:

Height IWeight
1651
62.5
Cf

Isample:

1651

62.5

I
:
:
Formulae table

G5:H5

=OFFSET(CP,$G$P,O)
=VLOOKUP(G2,B3:DlOO,2)
=VLOOKUP G2,63:D100,3

Height (cm)

Figure 111.5 Using OFFSET or VLOOKUP functions to sample paired data.
constructing a spliced or mixed distribution by averaging or manipulating probability density functions for the
component distributions;
knowing the pdf or pmf of a specific distribution not offered by your risk analysis software.
Application

We can use the same techniques as explained in method 2 to create distributions from a set of points:
If the dataset is of the form {x, f (x)), we can use the VoseRelative function in ModelRisk.
If the dataset is of the form {x, F ( x ) ) , we can use the VoseCumulA (or VoseCumulD) function in ModelRisk.
If the dataset is of the form {x, p(x)), we can use the VoseDiscrete function in ModelRisk.
The three functions have similar formats:
= VoseRelative(min, max, {x), { f (x)))

= VoseCumulA(min, max, {x), { F ( x ) ) )
= VoseDiscrete({x}, {p(x)))

The {x) values must be in ascending order for the VoseRelative and VoseCumulA functions because they construct
a distribution shape. For the VoseDiscrete function this is unnecessary because it is simply a list of values.

Appendix Ill A compendium of distributions

703

111.9 Approximation of One Distribution with Another
There are many situations where it is convenient, or just plain necessary, to approximate one distribution with
another. For example, if I toss a coin 1 million times, how many heads will there be? The appropriate distribution
is Binomial(1 000 000, 0.5), but such a distribution is utterly impractical to calculate. For a start, you would need to
calculate every factorial for integers between 0 and a million. However, under certain conditions, the Binomial(n, p )
distribution is very well approximated by a Normal(np, (npq)'/2) distribution (where q = 1 - p). In our example,
that would mean using a Normal(500 000, 500), and we could readily calculate, for example, the probability of
having exactly 501 000 heads:

This section looks at a number of approximations, why they work and how to use them. It also provides an
interesting way to gain a better understanding of the interrelationships between a number of the most common
distributions.
Before proceeding, it is worthwhile reminding ourselves of one of the most important theorems in probability
theory, the central limit theorem. In one of its forms it states that the sum of n random variables, each of which is
identically distributed, is normally distributed for large n. Moreover, if each of the random variables comes from a
distribution with mean p and standard deviation a, the sum of these n random variables has a Normal(np, f i a )
distribution. The theorem works for large n, but how large is "large"? The answer depends in part on the shape of
the distribution of the individual random variables: if they are normally distributed, then n is 1 or more; if they
are symmetric, then n > 10 or so; if they are moderately asymmetric, then n < 20-30 or so; and if they are very
skewed indeed (for example, more skewed than an exponential distribution with skewness = 2.0), then n > 50
will be reasonably accurate. The answer, of course, also depends on how good a fit one is happy with. For some
problems a rough fit may be perfectly adequate, and for other problems it may prove totally unacceptable. The
central limit theorem is particularly useful in this section to explain why several distributions gravitate to the shape
of a normal distribution under certain circumstances.

111.9.1 Approximations to the binomial distribution
The binomial distribution is a good starting point as it is the most fundamental distribution in probability theory. It
models the number of successes s in n trials, where each trial has the same probability of success p . The probability
of failure (1 - p ) is often written as q to make the equations a bit neater.

Normal approximation to the binomial
When n is large, and p is neither very small nor very large, the following approximation works very well:

The mean and standard deviation of a binomial distribution are np and (npq)'I2 respectively, so this approximation is quite easy to accept. It also fits nicely with the central limit theorem, because the Binomial(n, p ) distribution
can be thought of as the sum of n independent Binomial(1, p ) distributions, each with mean p and standard
deviation (pq)'I2.
The difficulty lies in knowing whether, for a specific problem, the values for n and p fall within the bounds
for which the normal distribution is a good approximation. A Binomial(1, 0.5) is symmetric, so we can intuitively
guess that one needs a fairly low value for n for the normal approximation to be reasonable when p = 0.5. On the
other hand, a Binomial(1, 0.95) is very highly skewed and we would reasonably expect that n would need to be
large for the normal approximation to work for such an extreme value of p . An easy way to judge this is to think

704

R~skAnalysts

Figure 111.6 Conditions for the normal approximation to the binomial.

about the range that a normal distribution takes: almost all of its probability is contained within a range f three
standard deviations from the mean. Now, we know that the binomial distribution is contained within the range [O,
n]. It would therefore be reasonable to say that the normal distribution is a good approximation if its tails stay
reasonably within these limits (Figure III.6), i.e.
~P-~J=>o

and n p + 3 J n p O < n

which simplify to
n>-

9P
1-P

9(1 - P)
and n > --P

A more stringent condition (using four instead of three standard deviations for the normal) would be to use 4
and 16 instead of 3 and 9 in the above equations. Figure 111.6 shows how these two conditions work together
symmetrically to show the (p, n) combinations that will work well. Decker and Fitzgibbon (1991) advise using the
normal approximation when n0.31p > 0.47, which is discussed more below. At larger values of n, which is when
one might wish to use an approximation, their rule of thumb is somewhat more conservative than that presented
here, even using a range f three standard deviations from the mean.
The normal distribution is continuous while the binomial distribution is discrete, and this approximation leads
to an additional error that can be avoided. Rather than use f (x) from the normal distribution to give the binomial
distribution probability p(x), it is more accurate to use F ( x 0.5) - F ( x - 0.5), where F(x) is the cumulative
distribution function for the normal distribution. If one is simulating the distribution, the equivalent is to use a
ROUND(. . ., 0) function around the normal distribution.

+

Poisson approximation to the binomial

The probability density function for the Poisson distribution can be derived from the binomial probability density
function by making n extremely large while p becomes very small, but within the constraint that np remains finite,
as shown in Section 8.3.1. Thus, the following approximation can be made to the binomial:
Binomial(n, p ) x Poisson(np) when n -+ oo,p + 0, np remains finite
The Poisson approximation tends to overestimate the tail probabilities at both ends of the distribution. Decker and
Fitzgibbon (1991) advise using this approximation when n0.31p < 0.47 and the normal approximation otherwise.

Appendix Ill A compendium of d~stributions 7 0 5

0.4 0.35 --

-2
.a

22

0.3 -0.25 -0.2 -0.15.0.10.05 --

0,

0

1

2

3

4
5
6
7
Allowable values

8

9

1

0

-I

Figure 111.7 The Poisson(1)distribution.

We can use a Poisson approximation to the binomial when p is close to 1, i.e. as (1 - p) + 0, by simply
reflecting the formula. In this case, Binomial@, p ) x n - Poisson(np), and the Decker-Fitzgibbon condition is
then n0.31(1- p ) < 0.47.

111.9.2 Normal approximation to the poisson distribution
The Poisson(At) distribution describes the possible number of events that may occur in an exposure of t units,
where the average number of events per unit of exposure is A. A Poisson(At) distribution is thus the sum of t
independent Poisson(h) distributions. We might intuitively guess then that, if At is sufficiently large, a Poisson(At)
distribution will start to look like a normal distribution, because of the central limit theorem, as is indeed the case.
A Poisson(1) distribution (see Figure 111.7) is quite skewed, so we would expect to need to add together some 20
or so before the sum would look approximately normal.
The mean and variance of a Poisson(At) distribution are both equal to At. Thus, the normal approximation to
the Poisson is given by

A much more generally useful normal approximation to the Poisson distribution is given by the formula

This formula works for values of A as low as 10.
The discrete property of the variable is lost with this approximation. The comments presented above for the normal approximation to the binomial also apply here for retrieving the discreteness and at the same time reducing error.

111.9.3 Approximations to the hypergeometric distribution
The Hypergeometric(n, D, M) distribution describes the possible number of successes one may have in n trials,
where a trial is a sample without replacement from a population of size M, and where a success is defined as
picking one of the D items in the population of size M that have some particular characteristic. So, for example,
the number of infected animals in a sample of size n, taken from a population M, where D of that population are
known to be infected, is described by a Hypergeometric(n, D, M) distribution. The probability mass function for
the hypergeometric distribution is a mass of factorial calculations, which is quite laborious to calculate and leads
us to look for suitable approximations.

-

706

Risk Analysis

Possible values x

Figure 111.8 Examples of the Poisson approximation to the hypergeometric.

Binomial approximation t o the hypergeometric

The hypergeometric distribution recognises the fact that we are sampling from a finite population without replacement, so that the result of a sample is dependent on the samples that have gone before it. Now imagine that the
population is very large, so that removing a sample of size n has no discernible effect on the population. Then
the probability that an individual sample will have the characteristic of interest is essentially constant and has
the value D I M , because the probability of resampling an item in the population, were one to replace items after
sampling, would be very small. In such cases, the hypergeometric distribution can be approximated by a binomial
as follows:
Hypergeometric(n, D , M ) x Binomial(n, D I M )
The rule most often quoted is that this approximation works well when n < 0.1 M
Poisson approximation t o the hypergeometric

We have just seen how the hypergeometric distribution can be approximated by the binomial, providing n < 0.1 M .
We have also seen in the previous section that the binomial can be approximated by the Poisson distribution,
providing n is large and p is small. It therefore follows that, where n < 0.1M and where D I M is small, we can
use the following approximation:
Hypergeometric(n, D , M)

Poisson(n D I M )

Figure 111.8 illustrates two examples.
Normal approximation t o the hypergeometric

When n < O . l M , so that the binomial approximation to the hypergeometric is valid, and when that binomial
distribution looks similar to a normal, we can use the normal approximation to the hypergeometric. This amounts
to three conditions:
M-D
n>9(T),

D
n>9(M-0).n<10

M

Append~xIll A compendium of distributions

707

Acceptable combinations of n, D
for Normal approximation to the
Hypergeometrlc, for M = 1000

Figure 111.9 Conditions for the normal approximation to the hypergeometric.

in which case we can use the approximation
Hypergeometric@, D, M) x Normal

(g-)/
,

Figure 111.9 illustrates how the three conditions on n, D and M combine to determine the region in which this
approximation is valid. Figure 111.10 shows some examples of hypergeometric distributions, taking the parameter
values indicated by the diamonds in Figure 111.9, which fall inside and outside the allowed region for the normal
approximation. For the normal approximation to the binomial, we originally chose the condition that the mean np
should be at least three standard deviations (npq)'I2 away from both 0 and n. However, we could be more stringent
in our conditions and make it four standard deviations instead of three. For the conditions above, that would mean
replacing each 9 with 16. Again, the discrete property of the variable is lost with this approximation, and the
comments on correcting this that were presented above for the normal approximation to the binomial also apply.

111.9.4 Approximations to the negative binomial distribution
The negative binomial distribution has a probability mass function that includes a binomial coefficient, and therefore
factorials that, like the binomial distribution, make it laborious or impossible to calculate directly. Approximations
to the negative binomial are thus very useful.
Normal approximation t o the negative binomial

The negative binomial distribution NegBin(s, p ) returns the number of failures one will have in a binomial process
before observing the sth success, where each trial has a probability p of success. A NegBin(1, y ) is therefore
the number of failures before observing the first success. If one wanted to observe two successes, one would
wait until the first success with attendant NegBin(1, p ) failures, then wait for the second success, with another
attendant NegBin(1, p ) failure. Extending this, it is easy to see how a NegBin(s, p ) is simply the sum of s
independent NegBin(1, p ) distributions. This is another ideal candidate for the central limit theorem. The mean and
standard deviation for a NegBin(1, p ) distribution are (1 - p ) / p and (1 - p)'/2/p respectively. The NegBin(1,

708

Risk Analysis

Possible values x

Figure 111.10 Examples of the normal approximationto the hypergeometric inside and outside the conditions
suggested in Figure 111.9.

p ) has a skewness of a t least 2.0 (for very low p), so we would expect to need to add 50 or so of these
NegBin(1, p ) distributions to reach something that looks fairly normally distributed. So, when s > 50, we can use
the approximation
NegBin(s, p )

%

Normal

P

The NegBin(s, p ) distribution has the same mean and standard deviation of s(1 - p ) / p and {s(l - p))L'2/p
respectively, which gives a useful check to this approximation. Again, the discrete property of the variable is
lost with this approximation, and the comments on correcting this that were presented above for the normal
approximation to the binomial also apply.
Exponential and gamma approximations t o the negative binomial

As the probability of success of a trial p tends to zero, the binomial process becomes a Poisson process, as described
in Section 8.3.1. Now, since p is so small, we will need a very large number of trials before observing a success.
The Poisson process provides the exponential distribution Expon(j3) as the "time" we will have to wait until our
first observation, and the gamma distribution Gamma(x, j3) as the "time" we will have to wait to observe x events.
Here, j3 is the average time we will wait before observing an event, which, for the binomial process, equates to the
average number of trials before the first success, i.e. the mean of the NegBin(1, p ) distribution, which is (1 - p ) / p
and tends to l l p as p gets small. So, by comparison, we can see that the following approximation will work for
small p :
NegBin(1, p ) x Expon

(3
(J

NegBin(s, p) x Gamma s -

In fact, the first of these two approximations is redundant since Gamma(1, j3) = Expon(j3) anyway.

Appendix Ill A compendium of distributions

709

111.9.5 Normal approximation to the gamma distribution
The Gamma(a, p ) distribution returns the "time" we will have to wait before observing a independent Poisson
events, where one has to wait on average p units of "time" between each event. The "time" to wait before a
single event occurs is a Gamma(1, 8 ) = Expon(j3) distribution, with mean j3 and standard deviation j3 too. The
Gamma(a, p ) is thus the sum of a! independent Expon(p) distributions, so the central limit theorem tells us for
sufficiently large a (>30, for example, from the discussion of the central limit theorem at the beginning of this
section) we can make the approximation

The Gamrna(a, p ) distribution has mean and standard deviation equal to aj3 and a'/2j3 respectively, which
provides a nice check to our approximation.

111.9.6 Normal approximation to the lognormal distribution
When the Lognormal(p, a ) distribution has an arithmetic mean p that is much larger than its arithmetic standard
deviation a , the distribution tends to look like a Normal(p, a ) , i.e.

A general rule of thumb for this approximation is p > 6 a . This approximation is not really useful from the
point of view of simplifying the mathematics, but it is helpful in being able quickly to think of the range of
the distribution and its peak in such circumstances. For example, I know that 99.7 % of a normal distribution is
contained within a range h 3 a from the mean p. So, for a Lognormal(l5, 2), I wonld estimate it wonld be almost
completely contained within a range [9, 211 and that it would peak at a little below 15 (remember that the mode,
median and mean appear in that order from left to right for a right-skewed distribution).

111.9.7 Normal approximation to the beta distribution
The beta distribution is difficult to calculate, involving a beta function in its denominator, so an approximation is often welcome. A Taylor series expansion of the beta distribution probability density function, described
in Example 9.5 of Section 9.2.5, shows that the Beta(al, a 2 ) distribution can be approximated by the normal
distribution when a1 and 1x2 are sufficiently large. More specifically, the conditions are

A pretty good rule of thumb is that a1 and a 2 should both equal 10 or more, but they can be as low as 6 if a1 % a2.
In such cases, an approximation using the normal distribution works well where we use the mean and standard
deviations from the exact beta distribution:
Beta(a!~,a2) x Normal

Y

-(a1

a1012

/(a:+

cx2)2(al

+ + 1) )

111.9.8 Normal approximation to the chi-square distribution
The chi-square distribution ChiSq(v) is quite easy to calculate, but for large n it can be approximated by a
normal distribution. The ChiSq(v) distribution is the sum of v independent (Normal(0, 1)12 distributions, so
ChiSq(a) ChiSq(j3) = ChiSq(a! B). A (Normal(0, 1)12 = ChiSq(1) distribution is highly skewed (skewness =

+

+

2.83). The central limit theorem says that ChiSq(v) will look approximately normal when v is rather large. A good
rule of thumb is that v > 50 or so to get a pretty good fit. In such cases, we can make the following approximation by matching moments (i.e. using the mean and standard deviation of a ChiSq(v) distribution in a normal
distribution):

The ChiSq(v) distribution peaks at x = v - 2, whereas the normal approximation peaks at v, so acceptance of
this approximation depends on being able to allow such a shift in the mode.

111.9.9 Normal approximation to the student t-distribution
The Student t-distribution Student(v) is quite difficult to calculate, but when v is large it is well approximated by
a normal distribution, as follows:

A general rule for this approximation is that v > 30. The relationship between the Student t and normal distributions is not intuitively obvious but is readily made apparent by doing a Taylor series expansion about x = 0 of
the probability density function for the Student distribution.

111.10 Recursive Formulae for Discrete Distributions
The ~ i n o m i a l ( l 0 lop6)
~,
distribution is shown in Figure 111.11. Although in theory it extends from 0 to lo6, in
practice it is very much constrained to the lowest values. Calculating the probabilities for each possible value of
this distribution is a problem, since one needs to evaluate lo6!, which is beyond the capacity of most computers
(Excel will calculate factorials up to 170, for example). The Stirling formula (see Section 6.3.4) can be used
to obtain a very good approximation of high factorials, but we still end up with the problem of having to deal
with manipulating very high numbers. An easier approach is to use recursive formulae. These formulae relate the
equation for the probability of the (i 1)th value to the probability of the ith value. Then one simply has to
calculate the probability of any one value explicitly (a value chosen to give the simplest calculation) and thereafter
use the recursive formula to determine all other probabilities.
The binomial probability mass function gives

+

~ ( i=
)

n!
,
pl(l
z!(n - i)!

-

p)"-'

and

p(i

n!

+ 1) = (i + l)!(n - i - I)!

Thus,

n!
i!(n - i)!

-

.

p"1 - p)"-'

(1 - p)n-i-I

i

Appendix Ill A compendium of distr~butions 7 1 1

0.35 -0.3 --

-

.-2
n

2

n

0.25 -0.2 -0.15 -0.1 -0.05
0

-I

0

1

2

I . ;

3
4
Variable values

5

4

6

Figure 111.11 The Binomial(106,10-6)distribution.

The binomial probability of zero successes is easily calculated:

so p(l), p(2), etc., can be readily determined using the recursive formula

If this binomial distribution is needed for a simulation, it is a simple enough task to use the x values and
calculated probabilities as inputs into a discrete distribution, as shown in Figure 111.12. The discrete distribution
then acts as if it were the required binomial distribution.
If the binomial probability in this example had been extremely high, say 0.999 99, instead of 0.000 01, we would
use the same technique but calculate backwards from p ( l 000 OOO), i.e.

and work backwards using

Here are other useful recursive formulae for some of the most common discrete probability distributions:
Poisson

Negative binomial

x
0
1
2
3
4
5
6
7
6
9
10

Binomial
probability f(x)
3.7E-01
3.7E-01
1.6E-01
6.1 E-02
1.5E-02
3.1E-03
5.1 E-04
7.3E-05
9.1E-06
1.OE-06
1.OE-07

Distribution
I

C6
C7:C16
F l 1 (output)

I

1

Formulae table
=(1-p)%
=(n-B6)*p*C6/(BT(I-p))
=Discrete(B6:B16,C6:C16)

Figure 111.12 Model example of using a recursive formula to generate a binomial distribution.

Geometric

Hypergeometric

The formula for p(0) for the hypergeometric distribution is unwieldy but can still be very accurately approximated
without resorting to factorial calculations by using the Stirling formula (see Section 6.3.4) to give

This last formula will usually have to be calculated in log space first, then converted to real space at the end in
order to avoid intermediary calculations of very large numbers. Another formula for p(0) that is only very slightly
less accurate for larger M and generally more accurate for small n is provided by Cannon and Roe (1982):

Appendix Ill A cornpend~urnof dlstributlons

713

This formula is produced by expanding the factorials for the equation for p(0) and cancelling common factors
top and bottom to give

~ ( 0=
)

( M - n)!(M - D)!
- M!(M - D - n)!

We then take the first and last terms in each of the numerator and denominator, average the two terms and raise to
the power of the number of terms (D) top and bottom:

An alternative formulation is obtained by swapping around n and D to give

This works since Equation (111.9) is symmetric in n and D , i.e. if n and D swap places the equation remains
the same. Equation (111.8) is a better approximation when n > D, and Equation (111.10) is better when n < D.

111.1 I A Visual Observation O n The Behaviour O f Distributions
During the courses I run on risk analysis modelling, I often get up in front of the class and ask them to imagine
that I am some generic distribution: my head is the peak and my arms, extended to the sides, form the tails of the
distribution. I stand in the middle of the room and proclaim myself to be "normal". My outstretched arms do not
perceive any restrictions as they do not touch the walls of the room. I then walk towards one side of the room
until an outstretched hand touches a wall. I start to contort, losing my symmetry since I am constrained not to pass
through the wall, my unfettered arm remaining outstretched. At this point, I proclaim that I am starting to look
"lognormal". I continue to approach the wall until my head (the distribution's peak) touches it. I suddenly shoot
up my greatly restricted arm above myself and declare that I now look "exponential" - actually, I look more like
the white-suited John Travolta in his famous cinematic pose.
The exercise may make me look like a particularly eccentric Englishman, but it is memorable and gets the point
across (after all, probability theory can be rather a dry subject). The idea is that one can quite easily picture the
behaviour of a lot of distributions that can change shape. If the distribution is centred far from its boundaries, with
a small enough spread that it does not "see" those boundaries, it very often looks normal. On the other hand, if
the distribution "sees" one of its boundaries, it will often be skewed away from that limit and takes on a shape we
usually think of as being "lognormal". Finally, if the distribution has its mode at, or very close to, an extreme, it
will often look similar to an exponential distribution.

Appendix IV

Further reading
I always appreciate it when someone personally recommends a good book, and "good" for me, in the context of
risk analysis, is one that goes light on the theory, unless you need it, and focuses on practical applications. Here is
a list of books from our library that I particularly like, organised by topical area. The number of stars represents
how intellectually challenging they are:
1 star = somewhat simpler than this book;
2 stars = about the same as this book;
3 stars = suitable for researchers and masochists.

Some books carry one star because they are very accessible, but they may nonetheless represent the highest level
in their field. For the very latest in modelling techniques, particularly the development of financial models and
numerical techniques, we mostly rely on the web and journal articles. If you have some book suggestions for me,
please send me an email (David@voseconsulting.com), or a copy if it's your own book (you can find our office
addresses at our website www.voseconsulting.com), and if we like it we'll put it up on the website.
Simulation
Banks, J., Carson, J. S., Nelson, B. L. and Nicol, D. M. (2005). Discrete-event System Simulation, 4th edition.
Upper Saddle River, NJ: Prentice-Hall.**
Evans, J. R. and Olson, L. (1998). Introduction to Simulation and Risk Analysis. Upper Saddle River, NJ: PrenticeHall. [Simple, and with lots of illustrations.]*
Law, A. M. and Kelton, W. D. (2000). Simulation Modeling and Analysis, 3rd edition. New York, NY: McGrawHill.**
Morgan, M. G. and Henrion, M. (1990). Uncertainty. Cambridge, UK: Cambridge University Press. [Old now, but
has aged well.]**
Business risk and decision analysis
Clemen, R. T. and Reilly, T. (2001). Making Hard Decisions. Pacific Grove, CA: Duxbury Press.*
Goodwin, P. and Wright, G. (1998). Decision Analysis for Management Judgment. New York, NY: John Wiley &
Sons Inc.*
Raiffa, H. and Schlaiffer, R. (2000). Applied Statistical Decision Theory. New York, NY: John Wiley & Sons
Inc.***
Schuyler, J. (2001). Risk and Decision Analysis in Projects. Pennsylvania, USA: Project Management Institute.**
Extreme value theory
Gumbel, E. J. (1958). Statistics of Extremes. New York, NY: Columbia University Press.**
Embrechts, P., Klupperberg, C. and Mikosch, T. (1999). Modelling Extreme Events for Insurance and Finance.
New York, NY: Springer-Verlag.**

Insurance

1

Daykin, C. D., Pentikainen, T . and Pesonen, M. (1994). Practical Risk Theory for Actuaries. New York, NY:
Chapman and Hall.**
Dickson, D. C. M. (2005). Insurance: Risk and Ruin. Cambridge, UK: Cambridge University Press.***
Embrechts, P., Klupperberg, C. and Mikosch, T . (1999). Modelling Extreme Events for Insurance and Finance.
New York, N Y : Springer.**
Kleiber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences. New York, NY:
John Wiley & Sons Inc.**
Klugman, S. A., Panjer, H. H. and Willmot, G. E. (1998). Loss Models: from Data to Decisions. New York, NY:
John Wiley & Sons Inc. [Very theoretical, but thorough.]***
F~nancialrisk

Brealey, R. M . and Myers, S. C. (2000). Principles of Corporate Finance, 6th edition. New York, NY: McGrawHill.**
Cherubini, U., Luciano, 0. and Vecchiato, W . (2004). Copula Methods in Finance. New York, N Y : John Wiley &
Sons Inc. [A really great book.]**
Clewlow, L and Strickland, C (1998).Implementing Derivatives Models. New York, N Y : John Wiley & Sons Inc.**
Cox, J . C., Ross, S. and Rubenstein, M. (1979). Option pricing: a simplified approach. J. Financial Economics 7 ,
229-263.***
Cox, J. C. and Rubinstein, M. (1985). Options Markets. Englewood Cliffs,NJ: Prentice-Hall.***
De Servigny, A. and Renault, 0. (2004). Measuring and Managing Credit Risk. New York, N Y : Standard &
Poor's, a division o f McGraw-Hill. [A really good practical perspective.]**
Dunis, C. (1996). Forecasting Financial Markets. New York, N Y : John Wiley & Sons Inc.**
Dunis, C. and Zhou, B. (1998). Nonlinear Modelling of High Frequency Financial Time Series. New York, NY:
John Wiley & Sons Inc.***
Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering. New York, N Y : Springer-Verlag.***
Hull, J . C. (2006). Options, Futures, and other Derivative Securities, 6th edition. Englewood Cliffs, NJ:
Prentice-Hall. [A classic.]**
Jackson, M. and Staunton, M. (2001). Advanced Modelling in Finance Using Excel and VBA. New York, NY:
John Wiley & Sons Inc.***
James, J. and Webber, N. (2000). Interest Rate Modelling. New York, N Y : John Wiley & Sons Inc.***
London, J. (2006). Modeling Derivatives Applications in Matlab, C++, and Excel. Upper Saddle River, NJ: FT
Press.***
McNeil, A. J., Riidiger, F. and Embrechts, P. (2005). Quantitative Risk Management. Princeton, NJ: Princeton
University Press. [Pretty theoretical but very thorough.]***
Schonbucher, P. J . (2003). Credit Derivatives Pricing Models. New York, N Y : John Wiley & Sons Inc.***
Wilmott, P. (1998). The Theory and Practice of Financial Engineering. New York, N Y : John Wiley & Sons Inc.***
Wilmott, P (1998). Derivatives. Chichester, UK: John Wiley & Sons Ltd.* * *
Wilmott, P. (2001). Paul Wilmott Introduces Quantitative Finance, New York, N Y : John Wiley & Sons Inc. [In
fact, anything that Paul produces has been great so far.]***
The bootstrap

Chernick, M . R. (1999). Bootstrap Methods - a Practitioner's Guide. New York, N Y : John Wiley & Sons Inc.**
Davison, A. C. and Hinkley, D. V . (1997). Bootstrap Methods and their Applications. Cambridge, U K : Cambridge
University Press.**
Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. New York, N Y : Chapman and Hall.***

1

Append~xlV Further reading

7 17

ieral statistics and probability theory
lett, V. (1999). Comparative Statistical Inference, 3rd edition. New York, NY: John Wiley & Sons Inc.**
~ d P.
, H. and Hardin, J. W. (2003). Common Errors In Statistics (and How to Avoid Them). New York, NY:
Wiley & Sons Inc.*

I

ns, M., Hastings, N. and Peacock, B. (1993). Statistical Distributions. New York, NY: John Wiley & Sons Inc.*
er, W. (1966). An Introduction to Probability Theory and its Applications, 2nd edition. New York, NY: John
ey & Sons Inc.***
rs, W. R., Richardson, R. and Spiegelhalter, D. J. (1996). Markov Chain Monte Carlo in Practice. New York,
: Chapman and Hall.***
lebner, D. E. and Shannon, P. F. (1993). Business Statistics: a Decision-making Approach. New York, NY:
cmillan.*
sen, F. V. (2001). Bayesian Networks and Decision Graphs. New York, NY: Springer-Verlag.***
nson, N. L., Kotz, S. and Balakrishnan, N. (1994). Continuous Univariate Distributions. New York, NY: John
ey & Sons Inc.**
nson, N. L., Kotz, S. and Balakrishnan, N. (1997). Discrete Multivariate Distributions. New York, NY: John
ley & Sons Inc.**
nson, N. L., Kotz, S. and Kemp, A. W. (1993). Univariate Discrete Distributions. New York, NY: John Wiley &
1s Inc.**
:iber, C. and Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Sciences. New York, NY:
In Wiley & Sons Inc.**
leden, P. E., Platen, E. and Schurz, H. (1994). Numerical Solution of SDE Through Computer Experiments. New
rk, NY: Springer-Verlag.***
tz, S., Balakrishnan, N. and Johnson, N. L. (2000). Continuous Multivariate Distributions, 2nd edition. New
rk, NY: John Wiley & Sons Inc.**
:yszig, E. (1993). Advanced Engineering Mathematics, 7th edition. New York, NY: John Wiley & Sons Inc.
,t specijically about probability, but an excellent maths refresher.]***
vin, R. I. and Rubin, D. S. (1994). Statistics for Management. Englewood Cliffs, NJ: Prentice-Hall.*
~schutz,S. (1974). Probability, Schaum's Outline Series. New York, NY: McGraw-Hill.*
:Clave, J. T., Dietrich, F. H. and Sincich, T. (1997). Statistics, 7th edition. Upper Saddle River, NJ:
:ntice-Hall.*

Innan, G. R. and Streiner, D. (2000). Biostatistics: the bare essentials. Toronto, ON: BC Decker. [A statistics
9k with jokes!]*
ss, S. M . (1976). A First Course in Probability. New York, NY: Macmillan."
tss, S. M. (1997). Introduction to Probability Models, 6th edition. Boston, NY: Academic Press.**
ylor, H. M. and Karlin, S. (1998). An Introduction to Stochastic Modeling, 3rd edition. London, UK: Academic
:ss.**
3ject risk analysis

lapman, C. and Ward, S. (1997). Project Risk Management. Chichester, UK: John Wiley & Sons Ltd.*
,ey, S. (1995). Practical Risk Assessment for Project Management. New York, NY: John Wiley & Sons Inc.*
:rzner, P. (2001). Project Management: a Systems Approach to Planning, Scheduling and Controlling, 7th edition.
:w York, NY: John Wiley & Sons Inc. [Not really risk analysis, but good planning leads to less risk.]*
mon, P. (ed.) (1997). Project Risk Analysis and Management. Norwich, UK: APM.*

718

Risk Analysis

Reliability theory

Bentley, J. P. (1999). Reliability and Quality Engineering. Harlow, UK: Addison-Wesley. [Short, good, simple
explanations.]*
Kottegoda, N. T. and Rosso, R. (1998). Statistics, Probability and Reliability for Civil and Environmental Engineers.
New York, NY: McGraw-Hill.**
Kuo, W., Prasad, V. R., Tillman, F. A. and Hwang, C. (2001). Optimal Reliability Design. Cambridge, U K :
Cambridge University Press.***
O'Connor, P. D. T. (1994). Practical Reliability Engineering, 3rd edition. New York, NY: John Wiley & Sons
Inc.*
Queueing theory

Bunday, B. D. (1996). An Introduction to Queueing Theory. London, UK: Amold.*
Bayesian statistics

Berry, D. A. and Stamgl, D. K. (1996). Bayesian Biostatistics. New York, NY: Dekker.**
Carlin, B. P. and Louis, T. A. (1996). Bayes and Empirical Bayes methods for Data Analysis. New York, NY:
Chapman and Hall.**
Congdon, P. (2003). Applied Bayesian Modelling. New York, NY: John Wiley & Sons Inc. [Has excellent models
included.]**
Gelman, A,, Carlin, J. B., Stem, H. S. and Rubin, D. B. (1995). Bayesian Data Analysis. New York, NY: Chapman
and Hall.***
Jensen, F. V. (2001). Bayesian Networks and Decision Graphs. New York, NY: Springer-Verlag.***
Neapolitan, R. E. (2004). Learning Bayesian Networks. Upper Saddle River, NJ: Prentice-Hall.***
Pole, A., West, M. and Hanison, J. (1999). Applied Bayesian Forecasting and Time Series Analysis. New York,
NY: Chapman and Hall, CRC Press.***
Press, S. J. (1989). Bayesian Statistics: Principles, Models and Applications. New York, NY: John Wiley & Sons
Inc.**
Tanner, M. A. (1996). Tools for Statistical Inference, 3th edition. New York, NY: Springer-Verlag.***
Forecasting

Dunis, C. (1996). Forecasting Financial Markets. New York, NY: John Wiley & Sons Inc.**
Newbold, P. and Bos, T. (1994). Introductory Business and Economic Forecasting, 2nd edition. Cincinnati, OH:
South Western Publishing.**
Pole, A., West, M. and Harrison, J. (1999). Applied Bayesian Forecasting and Time Series Analysis. New York,
NY: Chapman and Hall, CRC Press.***
General risk analysis

Bedford, T. and Cooke, R. (2001). Probabilistic Risk Analysis: Foundations and Methods. Cambridge, UK: Cambridge University Press.**
Newendorp, P. (1975). Decision Analysis for Petroleum Exploration. Tulsa, OK: Penwell.**
Risk communication

Granger Morgan, M., Fischhoff, B., Bostrom, A. and Atman, C. J. (2002). Risk Communication: a Mental Models
Approach. Cambridge, UK: Cambridge University Press.*

Appendix IV Further reading

719

Kahneman, D. and Tversky, A . (2000). Choices, Values and Frames. Cambridge, U K : Cambridge University Press.*
National Research Council (1989). Improving Risk Communication. Washington, DC: National Academy Press.*
Slovic, P. (2000). The Perception of Risk. London, U K : Earthscan Publications. [Read his articles too.]*
Epidemiology
Clayton, D. and Hills, M. (1998). Statistical Models in Epidemiology. Oxford, UK: Oxford University Press.**
Dijkhuizen, A. A. and Morris, R. S. (1997). Animal Health Economics: Principles and Applications. Sydney,
Australia: University o f Sydney.**
Dohoo, I., Martin, W . and Stryhn, H. (2003). Veterinary Epidemiological Research. Prince Edward Island, Canada:
AVC Inc.**
Fleiss, J . L., Levin, B. and Cho Paik, M. (2003). Statistical Methods for Rates and Proportions. New York, N Y :
John Wiley & Sons Inc.**
Groenendaal, H. (2005). Epidemiologic and Economic Risk Analysis on Johne's Disease Control. Wageningen, The
Netherlands: Wageningen University.**
McKellar, R. C. and Lu, X . (2004). Modeling Microbial Responses in Food. New York, N Y : Chapman and Hall,
CRC Press.**
McMeekin, T . A,, Olley, J. N., Ross, T . and Ratkowsky, D. A. (1993). Predictive Microbiology: Theory and
Application. Taunton, UK: Research Studies Press.**
Pearl, J. (2000). Causality: Models, Reasoning and Inference. Cambridge, U K : Cambridge University Press. [Beautiful turn of phrase.]***
Shipley, W . (2000). Cause and Correlation in Biology. Cambridge, U K : Cambridge University Press.**
Terrence, M. E. and Isaacson, R. E. (2003). Microbial Food Safety in Animal Agriculture. Iowa, 10: Iowa State
University Press.**
Software
Aberth, 0. (1998). Precise Numerical Methods Using C++. San Diego, C A : Academic Press.***
Hauge, J. W . and Paige, K. N. (2004).Learning Simul8: the Complete Guide. Bellingham, W A :Plain V u Publishers."
Kloeden, P. E., Platen, E. and Schurz, H. (1994).Numerical Solution of SDE Through Computer Experiments. New
York, N Y : Springer-Verlag.***
London, J. (2006). Modeling Derivatives Applications in Matlab, C++, and Excel. Upper Saddle River, NJ: FT
Press.***
Shepherd, R. (2004). Excel VBA Macro Programming. New York, N Y : McGraw-Hill.**
Walkenbach, J. (2002). Excel 2002 Formulas. New York, N Y : M&T Books.*
Fun

Aczel, A. (1998). Probability 1: Why There Must Be Intelligent Life in the Universe. New York, N Y : Harcourt
Brace.*
Bennett, D. J . (1998).Randomness. Cambridge, M A : Harvard University Press.*
Bernstein, P. L. (1996). Against the Gods. New York, N Y : John Wiley & Sons Inc. [Iparticularly enjoyed thefirst,
non-jinancial ha&]*
Laplace, Marquis de (1951).A Philosophical Essay on Probabilities. New York, N Y : Dover.**
Peterson, I. (1998). The Jungles of Randomness: a Mathematical Safari. New York, N Y : John Wiley & Sons Inc.*

References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control
AC19, 716-723.
Akaike, H. (1976). Canonical correlation analysis of time series and the use of an information criterion, in System
Identi'cation: Advances and Case Studies, ed. by Mehra, R. K. and Lainotis, D. G. New York, NY: Academic
Press; 52- 107.
Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain "goodness of fit" criteria based on
stochastic processes. Ann. Math. Stat. 23, 193-212.
Artzner, P., Delbaen, F., Eber, J.-M. and Heath, D. (1997). Thinking coherently. Risk lO(11).
Bartholomew, M. J., Vose, D. J., Tollefson, L. R., Curtis, C. and Travis, C. C. (2005). A linear model for managing
the risk of antimicrobial resistance originating in food animals. Risk Analysis 25(1).
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philos. Trans. R. Soc. London
53, 370-418. Reprinted in Biometrica 45, 293-315 (1958).
Bazaraa, M. S., Jarvis, J. J. and Sherali, H. D. (2004). Linear Programming and Network Flows, 3rd edition. New
York, NY: John Wiley and Sons Inc.
Bazaraa, M. S., Sherali, H. D. and Shetty, C. M. (2006). Nonlinear Programming: Theory and Algorithms, 3rd
edition. New York, NY: John Wiley and Sons Inc.
Bernoulli, J. (1713). Ars Conjectandi. Basilea: Thurnisius.
Birnbaum, Z. W. and Saunders, S. C. (1969). A new family of life distributions. J. Appl. Prob. 6, 637-652.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31,
307-327.
Boone, I., Van der Stede, Y., Bollaerts, K., Vose, D., Daube, G., Aerts, M. and Mintiens, K. (2007). Belgian "farmto-consumption" risk assessment-model for Salmonella in pigs: methodology for assessing the quality of data and
information sources. Research paper available from: ides.boone@var.fgov.be.
Biihlmann, H. (1980). An economic premium principle. ASTIM Bulletin 11, 52-60.
Cannon, R. M. and Roe, R. T. (1982). Livestock Disease Surveys. A Field Manual for Veterinarians. Bureau of
Resource Science, Department of Primary Industry. Australian Government Publishing Service, Canberra.
Chandra, M., Singpurwalla, N. D. and Stephens, M. A. (1981). Kolmogorov statistics for tests of fit for the extreme
value and Weibull distribution. J. Am. Stat. Assoc. 76(375), 729-731.
Cherubini, U., Luciano, 0. and Vecchiato, W. (2004). Copula Methods in Finance. New York, NY: John Wiley &
Sons Inc.
Clark, C. E. (1961). Importance sampling in Monte Carlo analysis. Operational Research 9, 603-620.
Clemen, R. T. and Reilly, T. (2001). Making Hard Decisions. Belmont, CA: Duxbury Press.
Cox, J. C., Ingersoll, J. E. and Ross, S. A. (1985). A theory of the term structure of interest rates. Econometrica
53, 385-407.
Dantzig, G. B. and Thapa, M. N. (1997). Linear Programming: I: Introduction. New York, NY: Springer-Verlag.

726

References

Dantzig, G. B. and Thapa, M. N. (2003). Linear Programming 2: Theory and Extensions. New York, NY: SpringerVerlag.
Davison, A. C. and Hinkley, D. V. (1997). Bootstrap Methods and their Applications. Cambridge, UK: Cambridge
University Press.
Decker, R. D. and Fitzgibbon, D. J. (1991). The normal and Poisson approximations to the binomial: a closer look.
Department of Mathematics Technical Report No. 82.3. Hartford, CT: University of Hartford.
De Pril, N. (1986). On the exact computation of the aggregate claims distribution in the individual life model.
ASTIN Bulletin 16, 109-1 12.
De Pril, N. (1989). The aggregate claims distribution in the individual model with arbitrary positive claims. ASTIN
Bulletin 19, 9-24.
Dickson, D. C. M. (2005). Insurance Risk and Ruin. Cambridge, UK: Cambridge University Press.
Ding, Z., Granger, C. W. J. and Engle, R. F. (1993). A long memory property of stock market returns and a new
model. Journal of Empirical Finance 1, 83-106.
Efron, B. (1979). Bootstrap methods: another look at the Jackknife. Ann. Statis. 7, 1-26.
Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. New York, N Y : Chapman and Hall.
Embrechts, P. (2000). Extreme value theory: potential and limitations as an integrated risk management tool.
Derivatives Use, Trading and Regulation 6, 449-456.
Engle, R. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom
inflation. Econometrics 50, 987- 1007.
EU (2005). PMWS case definition (Herd level). Sixth Framework Programme Prority SSPl5.4.6. Available at:
www.pcvd.org/documents/BelfastPresentationsCVDinalpmwscasedefinitionUOctober-2OO5.doc
[4 April
20061.
Evans, E., Hastings, N. and Peacock, B. (1993). Statistical Distributions, 2nd edition. New York, NY: John Wiley
& Sons Inc.
FAOlWHO (2000). Risk assessment: Salmonella spp. in broilers and eggs: hazard identification and hazard characterization of Salmonella in broilers and eggs. Preliminary report.
FAOlWHO (2003). Hazard characterization for pathogens in food and water: guidelines. Microbiological Risk
Assessment Series No. 3, ISBN 92 4 156 237 4 (WHO). ISBN 92 5 104 940 8 (FAO). ISSN 1726-5274.
Fletcher, S. G. and Ponnambalam, K. (1996). Estimation of reservoir yield and storage distribution using moments
analysis. Journal of Hydrology 182, 259-275.
Fu, M. (2002). Optimization for simulation: theory vs. practice. INFORMS Journal on Computing 14(3), 192-215.
Funtowicz, S. 0 . and Ravetz, J. R. (1990). Uncertainty and Quality in Science for Policy. Dordrecht, The Netherlands: Kluwer.
Furumoto, W. A. and Mickey, R. (1967). A mathematical model for the infectivity-dilution curve of tobacco
mosaic virus: theoretical considerations. Krology 32, 216-223.
Gelman, A,, Carlin, J. B., Stern, H. S. and Rubin, D. B. (1995). Bayesian Data Analysis. London, UK: Chapman
and Hall.
Gettinby, G. D., Sinclair, C. D., Power, D. M. and Brown, R. A. (2004). An analysis of the distribution of extreme
share returns in the UK from 1975 to 2000. J. Business Finance and Accounting 31(5), 607-646.
Gilks, W. R., Richardson, R. and Spiegelhalter, D. J. (1996). Markov Chain Monte Carlo in Practice. London, U K :
Chapman and Hall.
Glover, F., Laguna, M. and Marti, R. (2000). Fundamentals of scatter search and path relinking. Control and
Cybernetics 39(3), 653-684.
Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA:
Addison-Wesley.
Gzyl, H. (1995). The Method of Maximum Entropy. London, UK: World Scientific.

References

727

Hald, T., Vose, D., Wegener, H. C. and Koupeev, T. (2004). A Bayesian approach to quantify the contribution of
animal-food sources to human salmonellosis. Risk Analysis 24(1); J. Food Protection, 67(5), 980-992.
Haldane, J. B. S. (1948). The precision of observed values of small frequencies. Biometrika 35, 297-303.
Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal
Statistical Society, B 41, 190- 195.
Hertz, D. B. and Thomas, H. (1983). Risk Analysis and its Applications. New York, NY: John Wiley & Sons Inc.
(reprinted 1984).
Iman, R. L. and Conover, W. J. (1982). A distribution-free approach to inducing rank order correlation among
input variables. Commun. Statist.-Simula. Computa. 11(3), 3 11-334.
Iman, R. L., Davenport, J. M. and Zeigler, D. K. (1980). Latin hypercube sampling (a program user's guide).
Technical Report SAND79-1473, Sandia Laboratories, Albuquerque, NM.
Institute of Hydrology (1999). Flood Estimation Handbook. Crowmarsh Gifford, UK: Institute of Hydrology.
Jeffreys, H. (1961). Theory of Probability, 3rd edition. Oxford, UK: Oxford University Press.
Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika 36, 149-176.
Kaplan, S. and Garrick, B. J. (1981). On the quantitative definition of risk. Risk Analysis 1, 11-27.
Klugman, S., Panjer, H. and Willmot G. (1998). Loss Models: From Data to Decisions. New York, NY: John Wiley
and Sons Inc.
Kozubowski, T. J. and Podgdrski, K. (no date). Log-Laplace Distributions. Document found at http://unr.edu/
homepage/tkozubow/0logI .pdf.
Kreyszig, E. (1993). Advanced Engineering Mathematics, 7th edition. New York, NY: John Wiley & Sons Inc.
Laplace, P. S. (1774). Memoir on the probability of the causes of events. Mkmoires de Mathematique et de Physique,
Presentks a lfAcademieRoyale des Sciences, par divers savants et 12s dans ses Assemblks, 6, 621-656. Translated
by S. M. Stigler and reprinted in translation in Statistical Science 1(3), 359-378 (1986).
Laplace, P. S. (1812). Thiorie analytique des probabilitis. Paris: Courtier. Reprinted in Oeuvres Completes de
Laplace, Vol. 7. Paris, France: Gauthiers-Villars (1847).
Madan, D. B., Carr, P. P. and Chang, E. C. (1998). The variance gamma process and option pricing. European
Finance Review 2, 79- 105.
McClave, J. T., Dietrich, F. H. and Sincich, T. (1997). Statistics, 7th edition. Englewood Cliffs, NJ: Prentice-Hall.
Morgan, M. G. and Henrion, M. (1990). Uncertainty: a Guide to Dealing with Uncertainty in Quantitative Risk and
Policy Analysis. Cambridge, UK: Cambridge University Press.
Morris, C. N. (1983). Natural exponential families with quadratic variance functions: statistical theory. Ann. Statis.
11, 515-529.
Neapolitan, R. E. (2004). Learning Bayesian Networks. Upper Saddle River, NJ: Pearson Prentice-Hall.
Newcomb, S. (1881). Note on the frequency of use of different digits in natural numbers. Amer. J. Math. 4, 39-40.
OIE (2004). Handbook on Import Risk Analysis for Animals and Animal Products - Volume 2: Quantitative Risk
Assessment. ISBN: 92-9044-629-3.
Panjer, H. H. (1981). Recursive evaluation of a family of compound distributions. ASTIN Bulletin 12, 22-26
Panjer, H. H., and Wilmot, G. E. Insurance Risk Models, 1992. Society of Actuaries, Schaumburg, IL.
Paradine, C. G. and Rivett, B. H. P. (1964). Statistical Methods for Technologists. London, UK: English Universities Press.
Pardalos, P. M. and Resende, M. G. C. (eds) (2002). Handbook of Applied Optimization. New York, NY: Oxford
Academic Press.
Pearl, J. (2000). Causality: Models, Reasoning and Inference. Cambridge, UK: Cambridge University Press.
Popper, K. R. (1988). The Open Universe: An Argument for Indeterminism. Cambridge, UK: Cambridge University
Press.

728

References

Press, S. J. (1989). Bayesian Statistics: Principles, Models, and Applications. Wiley series in probability and mathematical statistics. John Wiley & Sons Inc., New York.
Press, W. H., Flannery, B. P., Tenkolsky, S. A. and Vetterling, W. T. (1986). Numerical Recipes: The Art of Scientijc Computing. Cambridge, UK: Cambridge University Press.
Randin, R. L. (1997). Optimization in Operations Research. Upper Saddle River, NJ: Prentice-Hall.
Robertson, J. P. (1992). The Computation of Aggregate Loss Distributions. Proceedings of the Casualty Actuarial
Society, Arlington, VA, 57-133.
Rubinstein, R. (1981). Simulation and Monte Carlo Methods. New York, NY: John Wiley & Sons Inc.
Savage, L. J. et al. (1962). The Foundations of Statistical Inference. New York, NY: John Wiley & Sons Inc.
(Methuen & Company, London).
Schwarz, G. (1978). Estimating the Dimension of a Model. The Annuals of Statistics 6, No.2 (Mar., 1978), 461 -464.
Shipley, W. (2000). Cause and Correlation in Biology. Cambridge, UK: Cambridge University Press.
Sivia, D. S. (1996). Data Analysis: a Bayesian Tutorial. Oxford, UK: Oxford University Press.
Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. J. Am. Stat. Assoc. 69(347),
730-733.
Stephens, M. A. (1977). Goodness of fit for the extreme value distribution. Biometrica 64(3), 583-588.
Stern, N. J., Clavero, M. R. S., Bailey, J. S., Cox, N. A. and Robach, M. C. (1995). Campylobacter spp. in broilers
on the farm and after transport. Pltry Sci. 74, 937-941.
Stem, N. J. and Robach, M. C. (2003). Enumeration of Campylobacter spp. in broiler feces and in corresponding
processed carcasses. J. Food Protection 66, 1557-1563.
Stumbo, C. R. (1973). Thermobacteriology in food processing. New York, NY: Academic Press.
Taylor, H. M. and Karlin, S. (1998). An Introduction To Stochastic Modelling, 3rd edition. New York, NY: Academic Press.
Teunis, P. F. M. and Havelaar, A. H. (2000). The beta-Poisson dose-response model is not a single-hit model.
Risk Analysis 20(4), 513-520.
Van der Sluijs, J. P., Risbey, J. S. and Ravetz, J. (2005). Uncertainty assessment of VOC emissions from paint in
the Netherlands using the Nusap system. Environmental Monitoring and Assessment (105), 229-259.
Van Boekel, M. A. J. S. (2002). On the use of the Weibull model to describe thermal inactivation of microbial
vegetative cells. Intl J. Food Microbiology 74, 139-159.
Van Gelder, P., de Ronde, P., Neykov, N. M. and Neytchev, P. (2000). Regional frequency analysis of extreme
wave heights: trading space for time. Proceedings of the 27th ICCE Coastal Engineering 2000, Sydney, Australia,
2, 1099-11 12.
Wang, S. S. (1996). Premium calculation by transforming the layer premium density. ASTIN Bulletin 26(1), 71 -92.
Wang, S. S. (2003). Equilibrium pricing transforms: new results using Buhlmann's 1980 economic model. ASTIN
Bulletin 33(1), 57-73.
Williams, P. R. D. (1999). A comparison of food safety risks: science, perceptions, and values. Unpublished doctoral
dissertation, Harvard School of Public Health.

Index
@RISK 109-1 10
Accident insurance 509
Accuracy
Of mean 156
Of percentile 158
Aggregate distributions 305, 466
Compound Poisson approximation 3 12, 508
De Pril method 3 11, 507
Fast Fourier transform method 310, 317, 495, 497,
5 10
Implementing in ModelRisk 578
Moments 305
Panjer's recursive method 308
AIC
See Akaike Information Criterion
Akaike Information Criterion 295
Anderson-Darling statistic 284-286, 292-294
Approximating one distribution with another 703
Archimedean copulas 368-37 1
Arrhenius-Davey attenuation model, bacteria 524
Assumptions 22, 69
Attenuation models, bacteria 5 19
Autoregressive time series models 335-339
APARCH 338
AR 335
ARCWGARCH 337
ARMA 337
EGARCH 339
Bayes' theorem 126, 215
Bayesian belief networks 427
Bayesian bootstrap 254
Bayesian inference 215
By simulation 225
BtSlehradek growth model, bacteria 522
Bernoulli distribution 599
Bernoulli, Jakob 599
Beta distribution 158, 171, 232-4, 249, 254, 272, 600
Approximation 238, 709

Beta4 distribution 601
Beta-Binomial distribution 173, 193, 233, 602
Beta-binomial dose-response model 530
Beta-Geometric distribution 604
Beta-Negative Binomial distribution 173, 605
Beta-Poisson dose-response model 529
BetaPERT distribution
See PERT distribution
BIC
See Schwarz Information Criterion
Bilateral Exponential distribution
See Laplace distribution
Binned data 283-284
Binomial coefficient 124
Binomial distribution 226, 249, 344, 606, 710
Approximations 703
Derivation 168
Binomial process 167, 212, 234
Binomial theorem 124
.
Bootstrap 246-254
Non-parametric 247 -249
Parametric 249-254, 298
Bradford distribution 607
Bradford law of scattering 607
Bradford, Samuel Clement 607
Brainstorming sessions 4 14
Bounded and unbounded distributions 586, 588
Biihlmann credibility factor 494
Burr distribution 609
Burr, Type 111
See Burr distribution
Calibrating experts 4 12
Cauchy distribution 610
Causality 423-434
Cdf
See Cumulative distribution function
Censored data 283-284
Central limit theorem 122, 188

728

References

Press, S. J. (1989). Bayesian Statistics: Principles, Models, and Applications. Wiley series in probability and mathematical statistics. John Wiley & Sons Inc., New York.
Press, W. H., Flannery, B. P., Tenkolsky, S. A. and Vetterling, W. T. (1986). Numerical Recipes: The Art of Scientijic Computing. Cambridge, UK: Cambridge University Press.
Randin, R. L. (1997). Optimization in Operations Research. Upper Saddle River, NJ: Prentice-Hall.
Robertson, J. P. (1992). The Computation of Aggregate Loss Distributions. Proceedings of the Casualty Actuarial
Society, Arlington, VA, 57- 133.
Rubinstein, R. (1981). Simulation and Monte Carlo Methods. New York, NY: John Wiley & Sons Inc.
Savage, L. J. et al. (1962). The Foundations of Statistical Inference. New York, NY: John Wiley & Sons Inc.
(Methuen & Company, London).
Schwarz, G. (1978). Estimating the Dimension of a Model. The Annuals of Statistics 6, No.2 (Mar., 1978), 461 -464.
Shipley, W. (2000). Cause and Correlation in Biology. Cambridge, UK: Cambridge University Press.
Sivia, D. S. (1996). Data Analysis: a Bayesian Tutorial. Oxford, UK: Oxford University Press.
Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. J. Am. Stat. Assoc. 69(347),
730-733.
Stephens, M. A. (1977). Goodness of fit for the extreme value distribution. Biometrica 64(3), 583-588.
Stern, N. J., Clavero, M. R. S., Bailey, J. S., Cox, N. A. and Robach, M. C. (1995). Campylobacter spp. in broilers
on the farm and after transport. Pltry Sci. 74, 937-941.
Stern, N. J. and Robach, M. C. (2003). Enumeration of Campylobacter spp. in broiler feces and in corresponding
processed carcasses. J. Food Protection 66, 1557-1563.
Stumbo, C. R. (1973). Thermobacteriology in food processing. New York, NY: Academic Press.
Taylor, H. M. and Karlin, S. (1998). An Introduction To Stochastic Modelling, 3rd edition. New York, NY: Academic Press.
Teunis, P. F. M. and Havelaar, A. H. (2000). The beta-Poisson dose-response model is not a single-hit model.
Risk Analysis 20(4), 513-520.
Van der Sluijs, J. P., Risbey, J. S. and Ravetz, J. (2005). Uncertainty assessment of VOC emissions from paint in
the Netherlands using the Nusap system. Environmental Monitoring and Assessment (105), 229-259.
Van Boekel, M. A. J. S. (2002). On the use of the Weibull model to describe thermal inactivation of microbial
vegetative cells. Inti J. Food Microbiology 74, 139-159.
Van Gelder, P., de Ronde, P., Neykov, N. M. and Neytchev, P. (2000). Regional frequency analysis of extreme
wave heights: trading space for time. Proceedings of the 27th ICCE Coastal Engineering 2000, Sydney, Australia,
2, 1099-1112.
Wang, S. S. (1996). Premium calculation by transforming the layer premium density. ASTIN Bulletin 26(1), 71-92.
Wang, S. S. (2003). Equilibrium pricing transforms: new results using Buhlmann's 1980 economic model. ASTIN
Bulletin 33(1), 57-73.
Williams, P. R. D. (1999). A comparison of food safety risks: science, perceptions, and values. Unpublished doctoral
dissertation. Harvard School of Public Health.

Index
@RISK 109- 110
Accident insurance 509
Accuracy
Of mean 156
Of percentile 158
Aggregate distributions 305, 466
Compound Poisson approximation 3 12, 508
De Pril method 3 11, 507
Fast Fourier transform method 310, 317, 495, 497,
510
Implementing in ModelRisk 578
Moments 305
Panjer's recursive method 308
AIC
See Akaike Information Criterion
Akaike Information Criterion 295
Anderson-Darling statistic 284-286, 292-294
Approximating one distribution with another 703
Archimedean copulas 368-37 1
Arrhenius-Davey attenuation model, bacteria 524
Assumptions 22, 69
Attenuation models, bacteria 5 19
Autoregressive time series models 335-339
APARCH 338
AR 335
ARCHIGARCH 337
ARMA 337
EGARCH 339
Bayes' theorem 126, 215
Bayesian belief networks 427
Bayesian bootstrap 254
Bayesian inference 2 15
By simulation 225
BElehrBdek growth model, bacteria 522
Bernoulli distribution 599
Bernoulli, Jakob 599
Beta distribution 158, 171, 232-4, 249, 254, 272, 600
Approximation 238, 709

Beta4 distribution 601
Beta-Binomial distribution 173, 193, 233, 602
Beta-binomial dose-response model 530
Beta-Geometric distribution 604
Beta-Negative Binomial distribution 173, 605
Beta-Poisson dose-response model 529
BetaPERT distribution
See PERT distribution
BIC
See Schwarz Information Criterion
Bilateral Exponential distribution
See Laplace distribution
Binned data 283-284
Binomial coefficient 124
Binomial distribution 226, 249, 344, 606, 710
Approximations 703
Derivation 168
Binomial process 167, 212, 234
Binomial theorem 124
Bootstrap 246-254
Non-parametric 247-249
Parametric 249-254, 298
Bradford distribution 607
Bradford law of scattering 607
Bradford, Samuel Clement 607
Brainstorming sessions 414
Bounded and unbounded distributions 586, 588
Biihlmann credibility factor 494
Burr distribution 609
Burr, Type I11
See Burr distribution
Calibrating experts 4 12
Cauchy distribution 610
Causality 423-434
Cdf
See Cumulative distribution function
Censored data 283-284
Central limit theorem 122, 188

730

Index

Chi distribution 612
Chi-Squared distribution 210, 613, 630
Approximation 709
Chi-Squared goodness-of-fit 287-290
Chi-Squared test 210
Claim size distribution 509
Classical statistics 208, 297
Classification trees 426
Clayton copula 3 13, 368-37 1
Compound Poisson approximation for aggregate
distribution 3 12, 5 18
Confidence intervals 285
Contingency allocation in projects 476-477
Continuous distributions 586
Copulas 362, 367-380
Archimedean 3 13, 368-37 1
Clayton 3 13, 369
Direction 378
Elliptical 371 -376
Empirical 379
Frank 3 17, 370
Gumbel 370
Implementing in ModelRisk 576
Modelling with 377-380
Normal 372
Student 373
Correlation 354
Copulas 367-380
Envelope method 380-391
In insurance portfolios 5 11
Lookup method 391 -392
Rank order 356-367
Of aggregate distributions 3 12
Of risks in a project portfolio 487-491
Counterfactual world 423
Covariance 354
Creating your own distributions 696
Credit rating, transition 499-503
Credit risk 494
Critical index 483
Critical path analysis in schedule risk 483-486
Critical values 285
Crystal Ball 109-1 10
Cumulative Ascending distribution 270, 614
Cumulative Descending distribution 6 15
Cumulative distribution function (cdf) 115, 596
Cumulative distribution plots 73
Overlaying 77
Relationship to density plots 78
Second order 76
Cyclicity, in a time series 323

DAG
See Bayesian belief networks
Dagum distribution 616
Data
Errors 266
Overdispersion 267
Truncated, censored or binned 283-284
DCF
See Discounted cashflow
de Finetti's theorem 226
De Pril method 31 1, 507
Death model
See Pure death model
Decision trees 40
Deductible (in insurance) 509
Default probability 494, 499
Delaporte distribution 183, 618
Dependency 265
Directed acyclic graphs
See Bayesian belief networks
Dirichlet distribution 175, 690
Disaggregation, in subjective estimates 401
Discounted cashflow models 46 1-472
Discrete distribution 619
Discrete distributions 408, 585
Discrete event simulation 40
Discrete Uniform distribution 226, 248, 621
Dose-response model 527 -532
Double-Exponential distribution 650
Distributions
Approximations 703
Bernoulli 599
Beta 158, 171, 232-4, 238, 249, 254, 272,600
Beta4 601
Beta-Binomial 173, 193, 233, 602
Beta-Geometric 604
Beta-Negative Binomial 173, 605
Binomial 168, 226, 249, 344, 606, 710
Bradford 607
Bounded and unbounded 586
Burr 609
Cauchy 610
Chi 612
Chi-squared 210, 613, 630
Continuous 264, 586
Creating your own 696
Cumulative Ascending 270, 614
Cumulative Descending 6 15
Dagum 616
Delaporte 183, 618
Dirichlet 175, 690

Discrete 264, 408, 585, 619
Discrete Uniform 226, 248, 621
Empirical 700
Erlang 623
Error Function 622
Expert estimates 593
Exponential 179, 235, 626
Exponential family 627
Extreme Value Max 627
Extreme Value Min 629
F 630
Fatigue Life 631
Fitting 263-300
Four-parameter beta
See Beta4 distribution
Four-parameter Kumaraswamy distribution
See Kumaraswamy4 distribution
Frequency 590
Gamma 180, 189, 298, 632
Geometric 634, 712
Generalised Logistic 635
Histogram 275, 637
Hyperbolic-Secant 638
Hypergeometric 639, 7 12
Inverse Gaussian 641
Inverse Hypergeometric 642
Inverse Multivariate Hypergeometric 69 1
Inverse Multivariate Hypergeometric2 692
Johnson Bounded 644
Johnson Unbounded 645
Kumaraswamy 646
Kumaraswamy4 648
Laplace 649
LCvy 650
Logarithmic 65 1
LogGamma 652
Logistic 653
LogLaplace 655
LogLogistic 656
LogNorma1 123, 658
LogNormalB 659
LogNormalE 660
LogWeibull 699
Modified PERT 406, 661
Multinomial 174, 340, 694
Multivariate 589, 690
Multivariate Hypergeometric 692
Multivariate Normal 695
Negative Binomial 169, 171, 343, 662, 71 1
Negative Multinomial 174, 695
Negative Multinomial2 696

Normal 122, 189, 209, 235, 252, 301, 665
Ogive 270, 666
Parametric and non-parametric 587
Pareto 512, 667
Pareto2 668
Pearson5 670
Pearson6 671
PERT 405, 672
Poisson 178, 235, 252, 618, 674, 711
Poisson-Lognormal 699
P6lya 183, 618, 675
Rayleigh 116, 677
Reciprocal 678
Relative 407, 679
Risk impact 590
Slash 680
Split Triangle 682
Step Uniform 379, 683
Student 210, 269, 685
Time or trials until 590
Triangle 163, 403, 686
Uniform 163, 229-231,404, 687
Variations in a financial market 591
Weibull 689
Elliptical copulas 37 1-376
Empirical copula 379
Envelope method of correlation 380-391
Erlang, A.K. 624
Erlang B distribution 624
Erlang C distribution 624
Erlang distribution 623
Error distribution 624
Error function distribution 622
Errors, in data 266
Errors, in subjective estimates 394-401
Errors, most common modelling 159
Esscher principle, in insurance premiums 5 14
Event trees 39
Excel 109
Running faster 147
Solver 152, 283, 439-444
Expected shortfall 497, 505-506
Expected value principle, in insurance premiums
Exponential distribution 179, 235, 626
Exponential dose-response model 527
Exponential family of distributions 627
Exponential growth model, bacteria 521
Exponential Power distribution
See Error distribution
Extreme Value Max distribution 627

5 14

732

Index

Extreme Value Min distribution 629
Extreme values, modelling 5 12-5 13
F distribution 630
Fast Fourier transform 310, 317, 495, 497, 510
Fatigue Life distribution 63 1
Fault trees 40
Fisher information 232
Fisher-Snedecor distribution
See F distribution
Forecasting
See Time series
Four-parameter beta distribution
See Beta4 distribution
Four-parameter Kumaraswamy distribution
See Kumaraswamy4 distribution
Frank copula 3 17, 370
Frequency distributions 590
Furry distribution
See Geometric distribution
Gamma distribution 180, 189, 298, 632
Approximation 709
Generalized Logistic 635
Generalized Error distribution
See Error distribution
Geometric Brownian motion 328-332
With mean reversion 332-334
With jump diffusion 334-335
With mean reversion and jump diffusion 335
Geometric distribution 634, 712
Gibbs sampling 246
GOF
See Goodness-of-fit statistics
Goodness-of-fit plots 285-297
Goodness-of-fit statistics 284-295
Gossett. William Sealy 685
Growth models, bacteria 5 19
Gumbel copula 370
Gumbel-Hougard copula
See Gumbel copula
Hannan-Quinn information criterion 295
Histogram distribution 275, 637
Histogram plots 70
HQIC
See Hannan-Quinn information criterion
Hyperbolic secant distribution 638
Hypergeometric distribution 183, 639, 712
Approximations 184, 705

Hypergeometric process 183
Multivariate 184
Hyperparameters 225, 233
Hyperpriors 233
Importance sampling 62
Individual life model 507
Influence diagrams 38
Information criteria 294-295, 35 1
Insurance
Accident 509
Correlated portfolio 5 11
Permanent life 508
Term life 506
Internal rate of return 470
Interpercentile range 97
Inverse Burr distribution
See Dagum distribution
Inverse Gaussian distribution 641
Inverse Hypergeometric distribution 185, 642
Inverse Multivariate Hypergeometric distribution 691
Inverse Multivariate Hypergeometric2 distribution
692
Inverse Paralogistic distribution 617
IRR
See Internal rate of return
Jacknife 247
Jacobian transformation 230
Johnson Bounded distribution 644
Johnson Unbounded distribution 645
Kappa distribution
See Dagum distribution
Kolmogorov-Smirnoff statistic 284-286, 291-292
Kumaraswamy distribution 646
Kumaraswamy4 distribution 648
Kurtosis 97, 141, 596
Laplace distribution 624, 649
Latin hypercube sampling 59, 267, 684, 697
Leading indicators, in forecasting 348-35 1
Least squares regression 131, 256, 355
Lecturers, guide for 567
LCvy distribution 650
LHS
See Latin hypercube sampling
Likelihood function 216, 235, 236-242
Likelihood principle 236
Liquidity risk 503

Index

Location parameters 23 1, 594
Logarithmic distribution 65 1
LogGamma distribution 652
Logistic distribution 653
Logistic model, bacterial growth and attenuation
525
LogLaplace distribution 655
LogLogistic distribution 656
LogNorma1 distribution 123, 658
Approximation 709
LogNormalB distribution 659
LogNormalE distribution 660
Long-term forecasts 352
Lookup table method of correlation 391 -392
Loss given default 494, 497
m-Erlang distribution
See Erlang distribution
Macros 152, 203, 444, 581
Market risk 503
Markov chain Monte Carlo 246
Markov chain time series 339-343,499-503
Markov inequality 129
Martingales 194
MaxEnt
See Maximum entropy principle
Maximum entropy principle 254
Maximum likelihood estimators 237, 250, 281
Mean 93, 137, 156, 209, 596
Mean deviation 96
Median 92, 138
Megaformulae 456
Mersenne twister 570
Metropolis algorithm 245
Mid-point Latin hypercube sampling 62
Mixture distributions 193
MLE
See Maximum likelihood estimators
Mode 92, 138, 696
ModelRisk 111- 112, 569
and Excel 570
distribution functions 570
object functions 572
risk event 576
Modified PERT distribution 406, 661
Moments, raw and central 141
Monte Carlo simulation 45, 57
Monte Hall problem 228
Multinomial distribution 174, 340, 694
Multivariate distributions 589
Multivariate Hypergeometric distribution 692

Multivariate Normal distribution 695
Multivariate Poisson process 181
Munchausen, Baron 247
Negative Binomial distribution 171, 343, 662,
711
Approximations 707
Derivation 169
Negative Multinomial distribution 174, 695
Negative Multinomial2 distribution 696
Net present value 469
Neural nets 426
Non-parametric
Bootstrap 247-249
Copula (empirical) 379
Distribution fitting 269-280
Normal copula 372
Normal distribution 122, 139, 189, 209, 235,
242-245, 252, 301, 665
Notation used in this book 112
NPV
See Net present value
Ogive distribution 270, 666
Operational risk 493
Optimisation 435-449
OptQuest 445-449
Panjer's recursive method 308
PAR
See Population attributable risk
Parametric
Bootstrap 249-254, 298
Distribution fitting 281 -300
Parametric and non-parametric distributions
Pareto distribution 5 12, 667
Pareto2 distribution 668
Pascal's triangle 125
Pdf
See probability density function
Pearson family of distributions 670
Pearson5 distribution 670
Pearson6 distribution 671
Percentiles 100, 158
Permanent life insurance 508
PERT distribution 406, 672
PH principle, in insurance premiums
See Risk-adjusted principle
P-I tables
See Probability-impact tables
Pivotal method 209

587

733

734

Index

Pmf
See probability mass function
Poisson distribution 182, 235, 252, 282, 618, 674,
711,
Approximation 242, 705
Derivation 178
Mixture 182
Poisson process 176, 213, 229, 240
Compound Poisson approximation 3 12
Multivariate 181
Regression 345-347
P6lya distribution 183, 618, 675
P6lya regression 346-347
Population attributable risk 424
Posterior distribution 2 16, 236
P-P plots 296-297
Premium calculations 5 13-5 15
Prior distribution 215, 228
Conjugate 233
Improper 232
Jeffreys 231
Multivariate 236
Subjective 234
Uninformed 229
Probability, definition of 118
Probability density function 116, 594
Probability equations, how to read 593
Probability mass function 115, 594
Probability-impact tables 14
Project risk analysis 473-491
Cascading risks 487-49 1
Contingency allocation 476-477
Cost risk analysis 474-478
Critical index 483
Critical path analysis 483-486
Risk portfolio 486
Schedule risk analysis 478-483
Prompt lists 6
Pure death model 344, 523
Q-Q plots

296-297

Random number generator seed
63
Randomness, in a time series 323
Rank order correlation 356-367
Coefficient 136
Raspe, Rudolph Erich 247
Rayleigh distribution 116, 677
Reciprocal distribution 678
Recursive formulae 7 10

Relative distribution 407, 679
Renewal process 190
Report writing 67
Risk-adjusted principle, in insurance premiums 5 15
Risk analyst, qualities 24
Risk event 159, 490, 576
Risk factor 424
Risk management
Evaluation 10
Possible options 7
Risk ranking 16
Risk registers 13
Risk-return plots 90
Risk transfer 11
Risks, identifying 5
Sample size, in data 267
Scale parameters 231, 594
Scatter plots
For sensitivity analysis 87
Of data in correlation analysis 355-359
Of simulation results 154- 156
Of time series 321, 350
Schwarz Information Criterion 295
Seasonality, in a time series 343
Semi-standard deviation 96
Semi-variance 96
Sensitivity analysis 80-88
Severity scores 17
Shape parameters 594
SIC
See Schwarz Information Criterion
Slash distribution 680
Skewness 97, 140, 596
Solver, Excel's 152, 283, 439-444
Spearman's rank order correlation coefficient 136
Spider plots 85
Split Triangle distribution 682
Stable distributions 650
Standard deviation 95, 96, 97, 139
Standard deviation principle, in insurance premiums
5 14
Step Uniform distribution 379, 683
Stirlings' formula 126
Stochastic dominance tests 100
Stochastic optimization 438
Strong law of large numbers 121
Student copula 373
Student distribution 210, 269, 685
Approximation 7 10

Index

Subadditivity 504
Subjective estimation 393-422
Brainstorming sessions 414
Calibrating experts 4 12
Conducting an interview 4 16
Cumulative distribution 407
Disaggregation 402
Discrete distribution 408
Errors in 394-401
Managing differing opinions 410
PERT distribution 405, 406
Relative 407
Triangle distribution 403
Uniform distribution 404
Sufficient statistic 209

Pure death model 344
Sales volumes 465-466
Yule growth model 343
Tornado charts 80, 489
Trend, in a time series 323
Trend plots 88
Triangle distribution 163, 403, 686
Truncated data 283-284
Two-tailed Exponential distribution
See Laplace distribution

T copula
See Student copula
T distribution
See Student distribution
T-test 21 1
Taylor series 128, 236
Tchebysheff s rule 129
Term life insurance 506
Time series
Autoregressive models 335-339
APARCH 338
AR 335
ARCHIGARCH 337
ARMA 337
EGARCH 339
Geometric Brownian motion (GBM)
328-332
GBM with jump diffusion 334-335
GBM with mean reversion 332-334
GBM with mean reversion and jump diffusion 335
Implementing in ModelRisk 579
Intervention, effect of 463
Leading indicators 348-35 1
Long-term 352
Market share 464
Markov Chains 339-343
Poisson regression 345-347
P6lya regression 346-347

Validation, of model 45 1-560
Value at risk 503-505
Value of information 102
Value of imperfect information (VOII) 104
Value of perfect information (VOPI) 103
Van Boekel attenuation model, bacteria 526
Vandermonde' s theorem 126
VaR
See Value at risk
Variability 47
Variance 94, 96, 138, 210, 596
Velhulst, Pierre 525
Venn diagrams 119
VISIFIT software 644
VOII
See Value of information
VOPI
See Value of information
Vose Consulting 721

U-parameter 570
Uncertainty 48
Uniform distribution 163, 229 -231, 404, 687
Utility theory 11- 13

Weibull distribution 689
Weibull-gamma dose-response model 53 1
What-if scenarios 3
WinBUGS 246, 298, 547-550
Yule growth model

343, 522

735



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.6
Linearized                      : No
XMP Toolkit                     : 3.1-701
Modify Date                     : 2008:06:28 19:58:02+02:00
Create Date                     : 2008:06:28 17:05:45+02:00
Metadata Date                   : 2008:06:28 19:58:02+02:00
Creator Tool                    : Adobe Acrobat 7.0
Format                          : application/pdf
Document ID                     : uuid:7d4427af-7300-448e-9d22-6cdf5e8fc387
Instance ID                     : uuid:a692a978-de07-4e16-9643-d33816efbea6
Producer                        : Adobe Acrobat 7.0 Paper Capture Plug-in
Has XFA                         : No
Page Count                      : 729
Creator                         : Adobe Acrobat 7.0
EXIF Metadata provided by EXIF.tools

Navigation menu