User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 529 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Page 1
All rights reserved.
The text of this publication, or any part thereof, may not be reproduced or transmitted in any form
or by any means, electronic or mechanical, including photocopying, recording, storage in an
information retrieval system, or otherwise, without prior permission of the publisher.
Whilst every effort has been made to ensure that the contents of this book are accurate, no
responsibility for loss occasioned to any person acting or refraining from action as a result of any
material in this publication can be accepted by the publisher or authors. In addition to this, the
authors and publishers accept no legal responsibility or liability for any errors or omissions in
relation to the contents of this book.
Level 1
First Edition 2012
This study manual has been fully revised and updated
in accordance with the current syllabus.
It has been developed in consultation with experienced lecturers.
Page 2
Page 3
Title Page
Introduction to the Course
Estimating Probabilities
Types of Event
The Two Laws of Probability
Tree Diagrams
Binomial Distribution
Poisson Distribution
Venn diagrams
Collection of Data
Types of Data
Requirements of Statistical Data
Methods of Collecting Data
Designing the Questionnaire
Choice of Method
Pareto Distribution and the “80:20” Rule
Introduction to Classification & Tabulation of Data
Forms of Tabulation
Secondary Statistical Tabulation
Rules for Tabulation
Sources of Data & Presentation Methods
Introduction to Frequency Distributions
Preparation of Frequency Distributions
Cumulative Frequency Distributions
Relative Frequency Distributions
Graphical Representation of Frequency Distributions
Introduction to Other Types of Data Presentation
Pie Charts
Bar Charts
General Rules for Graphical Presentation
The Lorenz Curve
Page 4
Title Page
The Need for Measures of Location
The Arithmetic Mean
The Mode
The Median
Introduction to Dispersion
The Range
The Quartile Deviation, Deciles and Percentiles
The Standard Deviation
The Coefficient of Variation
Averages & Measures of Dispersion
The Normal Distribution
Calculations Using Tables of the Normal Distribution
The Basic Idea
Building Up an Index Number
Weighted Index Numbers
Quantity or Volume Index Numbers
The Chain-Base Method
Deflation of Time Series
Simple Interest
Compound Interest
Introduction to Discounted Cash Flow Problems
Two Basic DCF Methods
Introduction to Financial Mathematics
Manipulation of Inequalities
Scatter Diagrams
The Correlation Coefficient
Rank Correlation
Regression Lines
Use of Regression
Connection between Correlation and Regression
Page 5
Title Page
Structure of a Time Series
Calculation of Component Factors for the Additive Model
The Z Chart
The Graphical Method
The Graphical Method Using Simultaneous Equations
Sensitivity Analysis (graphical)
The Principles of the Simplex Method
Sensitivity Analysis (simplex)
Using Computer Packages
Using Linear Programming
Risk & Uncertainty
Allowing for Uncertainty
Probabilities and Expected Value
Decision Rules
Decision Trees
The value of information
Sensitivity Analysis
Simulation Models
Origins of Spreadsheets
Modern Spreadsheets
How Spreadsheets work
Users of Spreadsheets
Advantages & Disadvantages of Spreadsheets
Spreadsheets in Today’s Climate
Page 6
Page 7
Stage: Level 1
Subject Title: L1.4 Business Mathematics
The aim of this subject is to provide students with the tools and techniques to understand the
mathematics associated with managing business operations. Probability and risk play an
important role in developing business strategy. Preparing forecasts and establishing the
relationships between variables are an integral part of budgeting and planning. Financial
mathematics provides an introduction to interest rates and annuities and to investment
appraisals for projects. Preparing graphs and tables in summarised formats and using
spreadsheets are important in both the calculation of data and the presentation of information
to users.
Learning Objectives:
On successful completion of this subject students should be able to show:
Demonstrate the use of basic mathematics and solve equations and inequalities
Calculate probability and demonstrate the use of probability where risk and
uncertainty exists
Apply techniques for summarising and analysing data
Calculate correlation coefficient for bivariate data and apply techniques for simple
Demonstrate forecasting techniques and prepare forecasts
Calculate present and future values of cash flows and apply financial mathematical
Apply spreadsheets to calculate and present data
Page 8
1. Basic Mathematics
Use of formulae, including negative powers as in the formulae for the learning
Order of operations in formulae, including brackets, powers and roots
Percentages and ratios
Rounding of numbers
Basic algebraic techniques and solution of equations, including simultaneous
equations and quadratic equations
Graphs of linear and quadratic equations
Manipulation of inequalities
2. Probability
Probability and its relationship with proportion and per cent
Addition and multiplication rules of probability theory
Venn diagrams
Expected values and expected value tables
Risk and uncertainty
3. Summarising and Analysing Data
Data and information
Tabulation of data
Graphs, charts and diagrams: scatter diagrams, histograms, bar charts and ogives.
Summary measures of central tendency and dispersion for both grouped and
ungrouped data
Frequency distributions
Normal distribution
Pareto distribution and the “80:20” rule
Index numbers
Page 9
4. Relationships between variables
Scatter diagrams
Correlation co-efficient: Spearman’s rank correlation coefficient and Pearson’s
correlation coefficient
Simple linear regression
5. Forecasting
Time series analysis graphical analysis
Trends in time series – graphs, moving averages and linear regressions
Seasonal variations using both additive and multiplicative models
Forecasting and its limitations
6. Financial Mathematics
Simple and compound interest
Present value(including using formulae and tables)
Annuities and perpetuities
Loans and Mortgages
Sinking funds and savings funds
Discounting to find net present value (NPV) and internal rate of return (IRR)
Interpretation of NPV and IRR
Page 10
7. Spreadsheets
Features and functions of commonly used spreadsheet software, workbook,
worksheet, rows, columns, cells, data, text, formulae, formatting, printing, graphs and
Advantages and disadvantages of spreadsheet software, when compared to manual
analysis and other types of software application packages
Use of spreadsheet software in the day to day work: budgeting, forecasting, reporting
performance, variance analysis, what-if analysis, discounted cash flow calculations
Page 11
Estimating Probabilities
Theoretical Probabilities
Empirical Probabilities
Types of Event
The Two Laws of Probability
Addition Law for Mutually Exclusive Events
Addition Law for a Complete List of Mutually Exclusive Events
Addition Law for Non-Mutually-Exclusive Events
Multiplication Law for Independent Events
Distinguishing the Laws
Tree Diagrams
Binomial Distribution
Poisson Distribution
Venn Diagrams
Page 12
Page 13
Suppose someone tells you “there is a 50-50 chance that we will be able to deliver your order
on Friday”. This statement means something intuitively, even though when Friday arrives
there are only two outcomes. Either the order will be delivered or it will not. Statements like
this are trying to put probabilities or chances on uncertain events.
Probability is measured on a scale between 0 and 1. Any event which is impossible has a
probability of 0, and any event which is certain to occur has a probability of 1. For example,
the probability that the sun will not rise tomorrow is 0; the probability that a light bulb will
fail sooner or later is 1. For uncertain events, the probability of occurrence is somewhere
between 0 and 1. The 50-50 chance mentioned above is equivalent to a probability of 0.5.
Try to estimate probabilities for the following events. Remember that events which are more
likely to occur than not have probabilities which are greater than 0.5, and the more certain
they are the closer the probabilities are to 1. Similarly, events which are more likely not to
occur have probabilities which are less than 0.5. The probabilities get closer to 0 as the events
get more unlikely.
(a) The probability that a coin will fall heads when tossed.
(b) The probability that it will snow next Christmas.
(c) The probability that sales for your company will reach record levels next year.
(d) The probability that your car will not break down on your next journey.
(e) The probability that the throw of a dice will show a six.
Page 14
The probabilities are as follows:
(a) The probability of heads is 0.5.
(b) This probability is quite low. It is somewhere between 0 and 0.1.
(c) You can answer this one yourself.
(d) This depends on how frequently your car is serviced. For a reliable car it should be
greater than 0.99.
(e) The probability of a six is 1/6 or 0.167.
Theoretical Probabilities
Sometimes probabilities can be specified by considering the physical aspects of the situation.
For example, consider the tossing of a coin. What is the probability that it will fall heads?
There are two sides to a coin. There is no reason to favour either side as a coin is
symmetrical. Therefore the probability of heads, which we call P(H) is:
P(H) = 0.5.
Another example is throwing a dice. A dice has six sides. Again, assuming it is not weighted
in favour of any of the sides, there is no reason to favour one side rather than another.
Therefore the probability of a six showing uppermost, P(6), is:
P(6) = 1/6 = 0.167.
Page 15
As a third and final example, imagine a box containing 100 beads of which 23 are black and
77 white. If we pick one bead out of the box at random (blindfold and with the box well
shaken up) what is the probability that we will draw a black bead? We have 23 chances out of
100, so the probability is:
(or P = 0.23)
Probabilities of this kind, where we can assess them from our prior knowledge of the
situation, are also called “a priori” probabilities.
In general terms, we can say that if an event E can happen in h ways out of a total of n
possible equally likely ways, then the probability of that event occurring (called a success) is
given by:
P(E) =
Empirical Probabilities
Often it is not possible to give a theoretical probability of an event. For example, what is the
probability that an item on a production line will fail a quality control test? This question can
be answered either by measuring the probability in a test situation (i.e. empirically) or by
relying on previous results. If 100 items are taken from the production line and tested, then:
Probability of failure P(F) =
So, if 5 items actually fail the
Number of possible ways of E occurring
Total number of possible outcomes
Number of items which fail
Total number of items tested
Page 16
P(F) = = 0.05.
Sometimes it is not possible to set up an experiment to calculate an empirical probability. For
example, what are your chances of passing a particular examination? You cannot sit a series
of examinations to answer this. Previous results must be used. If you have taken 12
examinations in the past, and failed only one, you might estimate:
Page 17
There are five types of event:
Mutually exclusive
Dependent or non-independent
(a) Mutually Exclusive Events
If two events are mutually exclusive then the occurrence of one event precludes the
possibility of the other occurring. For example, the two sides of a coin are mutually exclusive
since, on the throw of the coin, “heads” automatically rules out the possibility of “tails”. On
the throw of a dice, a six excludes all other possibilities. In fact, all the sides of a dice are
mutually exclusive; the occurrence of any one of them as the top face automatically excludes
any of the others.
(b) Non-Mutually-Exclusive Events
These are events which can occur
. For example, in a pack of playing cards hearts and queens are non-mutually-exclusive since
there is one card, the queen of hearts, which is both a heart and a queen and so satisfies both
criteria for success.
(c) Independent Events
These are events which are not mutually exclusive and where the occurrence of one event
does not affect the occurrence of the other. For example, the tossing of a coin in no way
affects the result of the next toss of the coin; each toss has an independent outcome.
Page 18
(d) Dependent or Non-Independent Events
These are situations where the outcome of one event is dependent on another event. The
probability of a car owner being able to drive to work in his car is dependent on him being
able to start the car. The probability of him being able to drive to work given that the car
starts is a conditional probability and
P(Drive to work|Car starts)
where the vertical line is a shorthand way of writing “given that”.
(e) Complementary Events
An event either occurs or it does not occur, i.e. we are certain that one or other of these
situations holds.
For example, if we throw a dice and denote the event where a six is uppermost by A, and the
event where either a one, two, three, four or five is uppermost by Ā (or not A) then A and Ā
are complementary, i.e. they are mutually exclusive with a total probability of 1. Thus:
P(A) + P(Ā) = 1.
This relationship between complementary events is useful as it is often easier to find the
probability of an event not occurring than to find the probability that it does occur. Using the
above formula, we can always find P(A) by subtracting P(Ā) from 1.
Page 19
Addition Law for Mutually Exclusive Events
Consider again the example of throwing a dice. You will remember that
What is the chance of getting 1, 2 or 3?
From the symmetry of the dice you can see that P(1 or 2 or 3) = 0.5. But also, from the
equations shown above you can see that
P(1) + P(2) + P(3) = 1/6 + 1/6 + 1/6 = 0.5.
This illustrates that
Page 20
P(1 or 2 or 3) = P(1) + P(2) + P(3)
This result is a general one and it is called the addition law of probabilities for mutually
exclusive events. It is used to calculate the probability of one of any group of mutually
exclusive events. It is stated more generally as:
P(A or B or ... or N) = P(A) + P(B) + ... + P(N)
where A, B ... N are mutually exclusive events.
Addition Law for a Complete List of Mutually Exclusive
(a) If all possible mutually exclusive events are listed, then it is certain that one of these
outcomes will occur. For example, when the dice is tossed there must be one number
showing afterwards.
P(1 or 2 or 3 or 4 or 5 or 6) = 1.
Using the addition law for mutually exclusive events, this can also be stated as
P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 1.
Again this is a general rule. The sum of the probabilities of a complete list of mutually
exclusive events will always be 1.
Page 21
An urn contains 100 coloured balls. Five of these are red, seven are blue and the rest are
white. One ball is to be drawn at random from the urn.
What is the probability that it will be red?
What is the probability that it will be red or blue?
P(R or B) = P(R) + P(B) = 0.05 + 0.07 = 0.12.
This result uses the addition law for mutually exclusive events since a ball cannot be both
blue and red.
What is the probability that it will be white?
The ball must be either red or blue or white. This is a complete list of mutually exclusive
Therefore P(R) + P(B) + P(W) = 1
P(W) = 1 P(R) P(B)
= 1 – 0.05- 0.07
= 0.88
Page 22
Addition Law for Non-Mutually-Exclusive Events
Events which are non-mutually-exclusive are, by definition, capable of occurring together.
The addition law can still be used but the probability of the events occurring together must be
P(A or B or both) = P(A) + P(B) P(A and B).
(a) If one card is drawn from a pack of 52 playing cards, what is the probability: (i) that it
is either a spade or an ace; (ii) that it is either a spade or the ace of diamonds?
(i) Let event B be “the card is a spade”. Let event A be “the card is an ace”.
We require P(spade or ace [or both]) = P(A or B)
= P(A) + P(B) P(A and B)
Page 23
(b) At a local shop 50% of customers buy unwrapped bread and 60% buy wrapped bread.
What proportion of customers buy at least one kind of bread if 20% buy both wrapped
and unwrapped bread?
Let S represent all the customers.
Let T represent those customers buying unwrapped bread.
Let W represent those customers buying wrapped bread.
P(buy at least one kind of bread) = P(buy wrapped or unwrapped or both)
= P(T or W)
= P(T) + P(W) P(T and W)
= 0.5 + 0.6 – 0.2
= 0.9
Page 24
So, 9/10 of the customers buy at least one kind of bread.
Multiplication Law for Independent Events
Consider an item on a production line. This item could be defective or acceptable. These two
possibilities are mutually exclusive and represent a complete list of alternatives. Assume that:
Probability that it is defective, P(D) = 0.2
Probability that it is acceptable, P(A) = 0.8.
Now consider another facet of these items. There is a system for checking them, but only
every tenth item is checked. This is shown as:
Probability that it is checked P(C) = 0.1
Probability that it is not checked P(N) = 0.9.
Again these two possibilities are mutually exclusive and they represent a complete list of
alternatives. An item is either checked or it is not.
Consider the possibility that an individual item is both defective and not checked. These two
events can obviously both occur together so they are not mutually exclusive. They are,
however, independent. That is to say, whether an item is defective or acceptable does not
affect the probability of it being tested.
There are also other kinds of independent events. If you toss a coin once and then again a
second time, the outcome of the second test is independent of the results of the first one. The
Page 25
results of any third or subsequent test are also independent of any previous results. The
probability of heads on any test is 0.5 even if all the previous tests have resulted in heads.
To work out the probability of two independent events both happening, you use the
multiplication law. This can be stated as:
P(A and B) = P(A) x P(B) if A and B are independent events.
Again this result is true for any number of independent events.
So P(A and B and ... and N) = P(A) x P(B) x ... x P(N).
Consider the example above. For any item:
Probability that it is defective, P(D) = 0.2
Probability that it is acceptable, P(A) = 0.8
Probability that it is checked, P(C) = 0.1
Probability that it is not checked, P(N) = 0.9.
Using the multiplication law to calculate the probability that an item is both defective and not
P(D and N) = 0.2 x 0.9 = 0.18.
The probabilities of the other combinations of independent events can also be calculated.
P(D and C) = 0.2 x 0.1 = 0.02
P(A and N) = 0.8 x 0.9 = 0.72
P(A and C) = 0.8 x 0.1 = 0.08.
Page 26
a) A machine produces two batches of items. The first batch contains 1,000 items of
which 20 are damaged. The second batch contains 10,000 items of which 50 are
damaged. If one item is taken from each batch, what it the probability that both items
are defective?
Since these two probabilities are independent
P(D1 and D2) = P(D1) x P(D2) = 0.02 x 0.005 = 0.0001.
b) A card is drawn at random from a well shuffled pack of playing cards. What is the
probability that the card is a heart? What is the probability that the card is a three?
What is the probability that the card is the three of hearts?
Page 27
c) A dice is thrown three times. What is the probability of one or more sixes in these
three throws?
Distinguishing the Laws
Although the above laws of probability are not complicated, you must think carefully and
clearly when using them. Remember that events must be mutually exclusive before you can
use the addition law, and they must be independent before you can use the multiplication law.
Another matter about which you must be careful is the listing of equally likely outcomes. Be
sure that you list all of them. For example, we can list the possible results of tossing two
coins, namely:
First Coin Second Coin
Heads Heads
Tails Heads
Heads Tails
Tails Tails
There are four equally likely outcomes. Do not make the mistake of saying, for example, that
there are only two outcomes (both heads or not both heads); you must list all the possible
outcomes. (In this case “not both heads” can result in three different ways, so the probability
of this result will be higher than “both heads”.)
Page 28
In this example, the probability that there will be one heads and one tails (heads - tails, or
tails - heads) is 0.5. This is a case of the addition law at work, the probability of heads - tails (
1/4 ) plus the probability of tails - heads ( 1/4 ). Putting it another way, the probability of
different faces is equal to the probability of the same faces - in both cases1/2.
Page 29
A compound experiment, i.e. one with more than one component part, may be regarded as a
sequence of similar experiments. For example, the rolling of two dice can be considered as
the rolling of one followed by the rolling of the other; and the tossing of four coins can be
thought of as tossing one after the other. A tree diagram enables us to construct an exhaustive
list of mutually exclusive outcomes of a compound experiment.
Furthermore, a tree diagram gives us a pictorial representation of probability.
By exhaustive, we mean that every possible outcome is considered.
By mutually exclusive we mean, as before, that if one of the outcomes of the compound
experiment occurs then the others cannot.
a) The concept can be illustrated using the example of a bag containing five red and
three white billiard balls. If two are selected at random without replacement, what is
the probability that one of each colour is drawn?
We can represent this as a tree diagram as in Figure 1.
N.B. R indicates red ball
W indicates white ball.
Probabilities at each stage are shown alongside the branches of the tree.
Figure 1.1
Page 30
Table 1.1
------ x ----- = –––
----- x ----- = –––
----- x ----- = –––
------ x ----- = –––
Page 31
We work from left to right in the tree diagram. At the start we take a ball from the bag. This
ball is either red or white so we draw two branches labelled R and W, corresponding to the
two possibilities. We then also write on the branch the probability of the outcome of this
simple experiment being along that branch.
We then consider drawing a second ball from the bag. Whether we draw a red or a white ball
the first time, we can still draw a red or a white ball the second time, so we mark in the two
possibilities at the end of each of the two branches of our existing tree diagram. We can then
see that there are four different mutually exclusive outcomes possible, namely RR, RW, WR
and WW. We enter on these second branches the conditional probabilities associated with
Thus, on the uppermost branch in the diagram we must insert the probability of
obtaining a second red ball given that the first was red. This probability is 4/7 as there are
only seven balls left in the bag, of which four are red. Similarly for the other branches.
Each complete branch from start to tip represents one possible outcome of the compound
experiment and each of the branches is mutually exclusive. To obtain the probability of a
particular outcome of the compound experiment occurring, we multiply the probabilities
along the different sections of the branch, using the general multiplication law for
We thus obtain the probabilities shown in Table 1.1. The sum of the probabilities should add
up to 1, as we know one or other of these mutually exclusive outcomes is certain to happen.
Page 32
b) A bag contains three red balls, two white balls and one blue ball. Two balls are drawn
at random (without replacement). Find the probability that:
i. Both white balls are drawn.
ii. The blue ball is not drawn.
iii. A red then a white are drawn.
iv. A red and a white are drawn.
To solve this problem, let us build up a tree diagram.
Figure 1.2
The first ball drawn has a subscript of 1, e.g. red first = R1. The second ball drawn has a
subscript of 2.
Page 33
Note there is only one blue ball in the bag, so if we picked a blue ball first then we can have
only a red or a white second ball. Also, whatever colour is chosen first, there are only five
balls left as we do not have replacement
Figure 1.3
Page 34
We can now list all the possible outcomes, with their associated probabilities:
Table 1.2
It is possible to read off the probabilities we require from Table 1.2.
(i) Probability that both white balls are drawn:
Page 35
(ii) Probability the blue ball is not drawn:
Probability that a red then a white are drawn:
Probability that a red and a white are drawn:
c) A couple go on having children, to a maximum of four, until they have a son. Draw a
tree diagram to find the possible families’ size and calculate the probability that they
have a son.
We assume that any one child is equally likely to be a boy or a girl, i.e. P(B) = P(G) = 1/2 .
Note that once they have produced a son, they do not have any more children. The tree
diagram will be as in Figure 1.4.
Page 36
Figure 1.4
Table 1.3
Possible Families
1 Boy
1 Girl, 1 Boy
( –– ) 2 = ––
2 Girls, 1 Boy
( –– ) 3 = ––
3 Girls, 1 Boy
( –– ) 4 = ––
4 Girls
( –– ) 4 = ––
= 1
Page 37
Probability they have a son is therefore:
Page 38
Page 39
The binomial distribution can be used to describe the likely outcome of events for discrete
variables which:
(a) Have only two possible outcomes; and
(b) Are independent.
Suppose we are conducting a questionnaire. The Binomial distribution might be used to
analyse the results if the only two responses to a question are ‘yes’ or ‘no’ and if the response
to one question (eg, ‘yes’) does not influence the likely response to any other question (ie
‘yes’ and ‘no’).
Put rather more formally, the Binomial distribution occurs when there are n independent trials
(or tests) with the probability of ‘success’ or ‘failure’ in each trial (or test) being constant.
Let p = the probability of ‘success’
Let q = the probability of ‘failure’
Then q = 1 – p
For example, if we toss an unbiased coin ten times, we might wish to find the probability of
getting four heads! Here n = 10, p (head) = 0.5, q (tail) = 0.5 and q = 1 – p.
The probability of obtaining r ‘successes’ in ‘n’ trials (tests) is given by the following
where C is the number of combinations.
The probability of getting exactly four heads out of ten tosses of an unbiased coin, can
therefore be solved as:
P(4) = 10C40.540.56
410 =
so P(4) = 210 x (0.5)4 x (0.5)6
Page 40
P(4) = 210 x 0.625 x 0.015625
P(4) = 0.2051
In other words the probability of getting exactly four heads out of ten tosses of an unbiased
coin is 0.2051 or 20.51%.
It may be useful to state the formulae for finding all the possible probabilities of obtaining r
successes in n trials.
Where P(r) = nCrprqn-r
And r = 0, 1, 2, 3, ….n
then, from our knowledge of combinations
P(O) = qn
P(1) = npq n-1
P(2) = n (n-1)P2qn-2
P(3) = n(n-1) (n-2) p3qn-3
P(4) = n(n-1)(n-2)(n-3) p4qn-4
P (n-2) = n(n-1) pn-2q2
P (n-1) = npn-1q
P(n) = pn
Page 41
The poisson distribution may be regarded as a special case of the binomial distribution. As
with the Binomial distribution, the Poisson distribution can be used where there are only two
possible outcomes:-
1. Success (p)
2. Failure (q)
These events are independent. The Poisson distribution is usually used where n is very large
but p is very small, and where the mean np is constant and typically < 5. As p is very small
(p < 0.1 and often much less), then the chance of the event occurring is extremely low. The
Poisson distribution is therefore typically used for unlikely events such as accidents, strikes
The Poisson distribution is also used to solve problems where events tend to occur at random,
such as incoming phone calls, passenger arrivals at a terminal etc.
Whereas the formula for solving Binomial problems uses the probabilities, for both “success”
(p) and “failure” (q), the formula for solving Poisson problems only uses the probabilities for
“success” (p).
If µ is the mean, it is possible to show that the probability of r successes is given by the
P(r) = eµr
where e = exponential constant = 2.7183
µ = mean number of successes = np
n = number of trails
p = probability of “success”
r = number of successes
If we substitute r = 0, 1, 2, 3, 4, 5…. in this formula we obtain the following expressions:
P (O) = e
P (1) = µe-u
P (2) = µ2e
2 x 1
P (3) = µ3e
Page 42
P (4) = µ4e
P (5) = µ5e
In questions you are either given the mean µ or you have to find µ from the information
given, which is usually data for n and p; µ is then obtained from the relationship µ = np.
You have to be able to work out e raised to a negative power.
e-3 is the same as 1 so you can simply work this out using 1
e3 2.71833
Alternatively, many calculators have a key marked ex. The easiest way to find e-3 on your
calculator is to enter 3, press +/- key, press e key, and you should obtain 0.049787. If your
calculator does not have an e key but has an xy key, enter 2.7813, press xy key, enter 3, press
+/- key, then press = key; you should obtain 0.049786.
Page 43
Venn diagrams or set diagrams are diagrams that show all possible logical relations between
a finite collection of sets.
A Venn diagram is constructed with a collection of simple closed curves drawn in a plane.
The “principle of these diagrams is that classes (or sets) be represented by regions in such
relation to one another that all the possible logical relations of these classes can be indicated
in the same diagram. That is, the diagram initially leaves room for any possible relation of
the classes, and the actual or given relation, can then be specified by indication that some
particular region is nor or is not null”
Venn diagrams normally comprise overlapping circles. The interior of the circle
symbolically represents the elements of the set, while the exterior represents elements that are
not members of the set.
For example: in a two-set Venn diagram, one circle may represent the group of all wooden
objects, while another circle may represent the set of all tables. The overlapping area or
intersection would then represent the set of all wooden tables.
Venn Diagram that shows the intersections of the Greek, Latin and Russian alphabets (upper
case letters)
Page 44
Page 45
Collection of Data
Collection of Data - Preliminary Considerations
Exact Definition of the Problem
Definition of the Units
Scope of the Enquiry
Accuracy of the Data
Types of Data
Primary and Secondary Data
Quantitative/Qualitative Categorisation
Continuous/Discrete Categorisation
Requirements of Statistical Data
Accurate Definition
Methods of Collecting Data
Published Statistics
Personal Investigation/Interview
Delegated Personal Investigation/Interview
Advantages of Interviewing
Disadvantages of Interviewing
Designing the Questionnaire
An Example
Choice of Method
Pareto Distribution and the “80:20” Rule
Page 46
Page 47
Even before the collection of data starts, there are some important points to consider when
planning a statistical investigation. Shortly I will give you a list of these together with a few
notes on each; some of them you may think obvious or trivial, but do not neglect to learn
them because they are very often the points which are overlooked. Furthermore, examiners
like to have lists as complete as possible when they ask for them!
What, then, are these preliminary matters?
Exact Definition of the Problem
This is necessary in order to ensure that nothing important is omitted from the enquiry, and
that effort is not wasted by collecting irrelevant data. The problem as originally put to the
statistician is often of a very general type and it needs to be specified precisely before work
can begin.
Definition of the Units
The results must appear in comparable units for any analysis to be valid. If the analysis is
going to involve comparisons, then the data must all be in the same units. It is no use just
asking for “output” from several factories - some may give their answers in numbers of items,
some in weight of items, some in number of inspected batches and so on.
Scope of the Enquiry
No investigation should be got under way without defining the field to be covered. Are we
interested in all departments of our business, or only some? Are we to concern ourselves with
our own business only, or with others of the same kind?
Page 48
Accuracy of the Data
To what degree of accuracy is data to be recorded? For example, are ages of individuals to be
given to the nearest year or to the nearest month or as the number of completed years? If
some of the data is to come from measurements, then the accuracy of the measuring
instrument will determine the accuracy of the results. The degree of precision required in an
estimate might affect the amount of data we need to collect. In general, the more precisely we
wish to estimate a value, the more readings we need to take.
Page 49
Primary and Secondary Data
In its strictest sense, primary data is data which is both original and has been obtained in
order to solve the specific problem in hand. Primary data is therefore raw data and has to be
classified and processed using appropriate statistical methods in order to reach a solution to
the problem.
Secondary data is any data other than primary data. Thus it includes any data which has been
subject to the processes of classification or tabulation or which has resulted from the
application of statistical methods to primary data, and all published statistics.
Quantitative/Qualitative Categorisation
Variables may be either quantitative or qualitative. Quantitative variables, to which we shall
restrict discussion here, are those for which observations are numerical in nature. Qualitative
variables have non-numeric observations, such as colour of hair, although, of course, each
possible non-numeric value may be associated with a numeric frequency.
Continuous/Discrete Categorisation
Variables may be either continuous or discrete. A continuous variable may take any value
between two stated limits (which may possibly be minus and plus infinity). Height, for
example, is a continuous variable, because a person’s height may (with appropriately accurate
equipment) be measured to any minute fraction of a millimetre. A discrete variable, however,
can take only certain values occurring at intervals between stated limits. For most (but not all)
discrete variables, these interval values are the set of integers (whole numbers).
For example, if the variable is the number of children per family, then the only possible
values are 0, 1, 2, ... etc. because it is impossible to have other than a whole number of
children. However, in Ireland, shoe sizes are stated in half-units, and so here we have an
example of a discrete variable which can take the values 1, 11/2, 2, 21/2, etc.
Page 50
Page 51
Having decided upon the preliminary matters about the investigation, the statistician must
look in more detail at the actual data to be collected. The desirable qualities of statistical data
are the following:
Accurate definition
The data must be in properly comparable units. “Five houses” means little since five
dwelling houses are very different from five ancestral castles. Houses cannot be compared
unless they are of a similar size or value. If the data is found not to be homogeneous, there
are two methods of adjustment possible.
a) Break down the group into smaller component groups which are homogeneous and
study them separately.
b) Standardise the data. Use units such as “output per man-hour” to compare the output
of two factories of very different size. Alternatively, determine a relationship between
the different units so that all may be expressed in terms of one; in food consumption
surveys, for example, a child may be considered equal to half an adult.
Great care must be taken to ensure that no important aspect is omitted from the enquiry.
Page 52
Accurate Definition
Each term used in an investigation must be carefully defined; it is so easy to be slack about
this and to run into trouble. For example, the term “accident” may mean quite different things
to the injured party, the police and the insurance company! Watch out also, when using other
people’s statistics, for changes in definition. Laws may, for example, alter the definition of an
“indictable offence” or of an “unemployed person”.
The circumstances of the data must remain the same throughout the whole investigation. It is
no use, for example, comparing the average age of workers in an industry at two different
times if the age structure has changed markedly. Likewise, it is not much use comparing a
firm’s profits at two different times if the working capital has changed.
Page 53
When all the foregoing matters have been dealt with, we come to the question of how to
collect the data we require. The methods usually available are as follows:
Use of published statistics
Personal investigation/interview
Delegated personal investigation/interview
Published Statistics
Sometimes we may be attempting to solve a problem that does not require us to collect new
information, but only to reassemble and reanalyse data which has already been collected by
someone else for some other purpose.
We can often make good use of the great amount of statistical data published by
governments, the United Nations, nationalised industries, chambers of trade and commerce
and so on. When using this method, it is particularly important to be clear on the definition of
terms and units and on the accuracy of the data. The source must be reliable and the
information up-to-date.
This type of data is sometimes referred to as secondary data in that the investigator himself
has not been responsible for collecting it and it thus came to him “second-hand”. By contrast,
data which has been collected by the investigator for the particular survey in hand is called
primary data.
The information you require may not be found in one source but parts may appear in several
different sources. Although the search through these may be time-consuming, it can lead to
data being obtained relatively cheaply and this is one of the advantages of this type of data
collection. Of course, the disadvantage is that you could spend a considerable amount of time
looking for information which may not be available.
Another disadvantage of using data from published sources is that the definitions used for
variables and units may not be the same as those you wish to use. It is sometimes difficult to
establish the definitions from published information, but, before using the data, you must
establish what it represent
Page 54
Personal Investigation/Interview
In this method the investigator collects the data himself. The field he can cover is, naturally,
limited. The method has the advantage that the data will be collected in a uniform manner
and with the subsequent analysis in mind. There is sometimes a danger to be guarded against
though, namely that the investigator may be tempted to select data that accords with some of
his preconceived notions.
The personal investigation method is also useful if a pilot survey is carried out prior to the
main survey, as personal investigation will reveal the problems that are likely to occur.
Delegated Personal Investigation/Interview
When the field to be covered is extensive, the task of collecting information may be too great
for one person. Then a team of selected and trained investigators or interviewers may be
used. The people employed should be properly trained and informed of the purposes of the
investigation; their instructions must be very carefully prepared to ensure that the results are
in accordance with the “requirements” described in the previous section of this study unit. If
there are many investigators, personal biases may tend to cancel out.
Care in allocating the duties to the investigators can reduce the risks of bias. For example, if
you are investigating the public attitude to a new drug in two towns, do not put investigator A
to explore town X and investigator B to explore town Y, because any difference that is
revealed might be due to the towns being different, or it might be due to different personal
biases on the part of the two investigators. In such a case, you would try to get both people to
do part of each town.
In some enquiries the data consists of information which must be supplied by a large number
of people. Then a very convenient way to collect the data is to issue questionnaire forms to
the people concerned and ask them to fill in the answers to a set of printed questions. This
method is usually cheaper than delegated personal investigation and can cover a wider field.
A carefully thought-out questionnaire is often also used in the previous methods of
investigation in order to reduce the effect of personal bias.
Page 55
The distribution and collection of questionnaires by post suffers from two main drawbacks:
a) The forms are completed by people who may be unaware of some of the requirements
and who may place different interpretations on the questions - even the most carefully
worded ones!
b) There may be a large number of forms not returned, and these may be mainly by
people who are not interested in the subject or who are hostile to the enquiry. The
result is that we end up with completed forms only from a certain kind of person and
thus have a biased sample.
It is essential to include a reply-paid envelope to encourage people to respond.
If the forms are distributed and collected by interviewers, a greater response is likely and
queries can be answered. This is the method used, for example, in the Population Census.
Care must be taken, however, that the interviewers do not lead respondents in any way.
Page 56
Page 57
Advantages of Interviewing
There are many advantages of using interviewers in order to collect information.
The major one is that a large amount of data can be collected relatively quickly and cheaply.
If you have selected the respondents properly and trained the interviewers thoroughly, then
there should be few problems with the collection of the data.
This method has the added advantage of being very versatile since a good interviewer can
adapt the interview to the needs of the respondent. Similarly, if the answers given to the
questions are not clear, then the interviewer can ask the respondent to elaborate on them.
When this is necessary, the interviewer must be very careful not to lead the respondent into
altering rather than clarifying the original answers. The technique for dealing with this
problem must be tackled at the training stage.
This “face-to-face” technique will usually produce a high response rate. The response rate is
determined by the proportion of interviews that are successful.
Another advantage of this method of collecting data is that with a well-designed
questionnaire it is possible to ask a large number of short questions of the respondent in one
interview. This naturally means that the cost per question is lower than in any other method.
Disadvantages of Interviewing
Probably the biggest disadvantage of this method of collecting data is that the use of a large
number of interviewers leads to a loss of direct control by the planners of the survey.
Mistakes in selecting interviewers and any inadequacy of the training programme may not be
recognised until the interpretative stage of the survey is reached. This highlights the need to
train interviewers correctly. It is particularly important to ensure that all interviewers ask
questions in a similar manner. Even with the best will in the world, it is possible that an
inexperienced interviewer, just by changing the tone of his or her voice, may give a different
emphasis to a question than was originally intended.
In spite of these difficulties, this method of data collection is widely used as questions can be
answered cheaply and quickly and, given the correct approach, the technique can achieve
high response rates.
Page 58
Page 59
A "questionnaire" can be defined as "a formulated series of questions, an interrogatory" and
this is precisely what it is. For a statistical enquiry, the questionnaire consists of a sheet (or
possibly sheets) of paper on which there is a list of questions the answers to which will form
the data to be analysed. When we talk about the "questionnaire method" of collecting data,
we usually have in mind that the questionnaires are sent out by post or are delivered at
people’s homes or offices and left for them to complete. In fact, however, the method is very
often used as a tool in the personal investigation methods already described.
The principles to be observed when designing a questionnaire are as follows:
a) Keep it as short as possible, consistent with getting the right results.
b) Explain the purpose of the investigation so as to encourage people to give the
c) Individual questions should be as short and simple as possible.
d) If possible, only short and definite answers like "Yes", "No", or a number of some
sort should be called for.
e) Questions should be capable of only one interpretation.
f) There should be a clear logic in the order in which the questions are asked.
g) There should be no leading questions which suggest the preferred answer.
h) The layout should allow easy transfer for computer input.
i) Where possible, use the "alternative answer" system in which the respondent has to
choose between several specified answers.
j) The respondent should be assured that the answers will be treated confidentially and
that the truth will not be used to his or her detriment.
k) No calculations should be required of the respondent.
The above principles should always be applied when designing a questionnaire and, in
addition, you should understand them well enough to be able to remember them all if you are
asked for them in an examination question. They are principles and not rigid rules - often one
has to go against some of them in order to get the right information. Governments can often
ignore these principles because they can make the completion of the questionnaire
compulsory by law, but other investigators must follow the rules as far as practicable in order
Page 60
to make the questionnaire as easy to complete as possible - otherwise they will receive no
An Example
An actual example of a self-completion questionnaire (Figure 7) is now shown as used by an
educational establishment in a research survey. Note that, as the questionnaire is incorporated
in this booklet, it does not give a true format. In practice, the questionnaire was not spread
over so many pages.
Figure 2.1
Page 61
Figure 2.2
Page 62
Figure 2.3
Page 63
Figure 2.4
Page 64
Page 65
Choice is difficult between the various methods, as the type of information required will
often determine the method of collection. If the data is easily obtained by automatic methods
or can be observed by the human eye without a great deal of trouble, then the choice is easy.
The problem comes when it is necessary to obtain information by questioning respondents.
The best guide is to ask yourself whether the information you want requires an attitude or
opinion or whether it can be acquired from short yes/no type or similar simple answers. If it is
the former, then it is best to use an interviewer to get the information; if the latter type of data
is required, then a postal questionnaire would be more useful.
Do not forget to check published sources first to see if the information can be found from
data collected for another survey.
Another yardstick worth using is time. If the data must be collected quickly, then use an
interviewer and a short simple questionnaire. However, if time is less important than cost,
then use a postal questionnaire, since this method may take a long time to collect relatively
limited data, but is cheap.
Sometimes a question in the examination paper is devoted to this subject. The tendency is for
the question to state the type of information required and ask you to describe the appropriate
method of data collection giving reasons for your choice.
More commonly, specific definitions and explanations of various terms, such as interviewer
bias, are contained in multi-part questions.
Page 66
Page 67
The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law
probability distribution that coincides with social, scientific, geophysical, actuarial, and many
other types of observable phenomena.
Probability density function
Pareto Type I probability density functions for various α (labeled "k") with xm = 1. The horizontal axis is the x
parameter. As α → ∞ the distribution approaches δ(x xm) where δ is the Dirac delta function
Cumulative distribution function
Pareto Type I cumulative distribution functions for various α(labeled "k") with xm = 1. The
horizontal axis is the x parameter.
Page 68
The “80:20 law”, according to which 20% of all people receive 80% of all income, and 20%
of the most affluent 20% receive 80% of that 80%, and so on, holds precisely when the
Pareto index is a=log4(5) = log (5)/log(4), approximately 1.1161.
Project managers know that 20% of the work consumes 80% of their time and resources.
You can apply the 80/20 rule to almost anything, from science of management to the physical
80% of your sales will come from 20% of your sales staff. 20% of your staff will cause 80%
of your problems, but another 20% of your staff will provide 80% of your production. It
works both ways.
The value of the Pareto Principle for a manager is that it reminds you to focus on the 20%
that matters. Of the things you do during your day, only 20% really matter. Those 20%
produce 80% of your results. Identify and focus on those things.
Page 69
Tabulation and Grouping of Data
Introduction to Classification and Tabulation of Data
Forms of Tabulation
Simple Tabulation
Complex Tabulation
Secondary Statistical Tabulation
Rules for Tabulation
The Rules
An Example of Tabulation
Sources of Data & Presentation Methods
Source, nature, application and use
Role of statistics in business analysis and decision making
Numerical data
Page 70
Page 71
Having completed the survey and collected the data, we need to organise it so that we can
extract useful information and then present our results. The information will very often
consist of a mass of figures in no very special order. For example, we may have a card index
of the 3,000 workers in a large factory; the cards are probably kept in alphabetical order of
names, but they will contain a large amount of other data such as wage rates, age, sex, type of
work, technical qualifications and so on. If we are required to present to the factory
management a statement about the age structure of the labour force (both male and female),
then the alphabetical arrangement does not help us, and no one could possibly gain any idea
about the topic from merely looking through the cards as they are. What is needed is to
classify the cards according to the age and sex of the worker and then present the results of
the classification as a tabulation. The data in its original form, before classification, is usually
known as “raw data”.
We cannot, of course, give here an example involving 3,000 cards, but you ought now to
follow this “shortened version” involving only a small number of items.
a) Raw Data
15 cards in alphabetical order:
Ayim, L. Mr 39 years
Balewa, W. Mrs 20
Buhari, A. Mr 22
Boro, W. Miss 22
Chahine, S. Miss 32
Diop, T. Mr 30
Diya, C. Mrs 37
Eze, D. Mr 33
Egwu, R. Mr 45
Gowon, J. Mrs 42
Gaxa, F. Miss 24
Gueye, W. Mr 27
Jalloh, J. Miss 28
Jaja, J. Mr 44
Jang, L. Mr 39
Page 72
b) Classification
(i) According to Sex
Ayim, L. Mr 39 years Balewa, W. Mrs 20 years
Buhari, A. Mr 22 Boro, W. Miss 22
Diop, T. Mr 30 Chahine, S. Miss 32
Eze, D. Mr 33 Diya. C. Mrs 37
Egwu, R. Mr 45 Gowon, J. Mrs 42
Gueye, W. Mr 27 Gaxa, F. Miss 24
Jaja, J. Mr 44 Jalloh, J. Miss 28
Jang, L. Mr 39
(ii) According to Age (in Groups)
Balewa, W. Mrs 20 years Ayim, L. Mr 39 years
Buhari, A. Mr 22 Chahine, S. Miss 32
Boro, W. Miss 22 Diop, T. Mr 30
Gaxa, F. Miss 24 Diya, C. Mrs 37
Gueye, W. Mr 27 Eze, D. Mr 33
Jalloh, J. Miss 28 Jang, L. Mr 39
Egwu, R. Mr 45 years
Gowon, J. Mrs 42
Jaja, J. Mr 44
Page 73
c) Tabulation
The number of cards in each group, after classification, is counted and the results presented in
a table.
Table 3.2
You should look through this example again to make quite sure that you understand what has
been done.
You are now in a position to appreciate the purpose behind classification and tabulation - it is
to condense an unwieldy mass of raw data to manageable proportions and then to present the
results in a readily understandable form. Be sure that you appreciate this point, because
examination questions involving tabulation often begin with a first part which asks, "What
is the object of the tabulation of statistical data?", or words to that effect.
Page 74
Page 75
We classify the process of tabulation into Simple Tabulation and Complex or Matrix
Simple Tabulation
This covers only one aspect of the set of figures. The idea is best conveyed by an example.
Consider the card index mentioned earlier; each card may carry the name of the workshop in
which the person works. A question as to how the labour force is distributed can be answered
by sorting the cards and preparing a simple table thus:
Table 3.3
Another question might have been, "What is the wage distribution in the works?", and the
answer can be given in another simple table (see Table 3.4).
Page 76
Table 3.4
Note that such simple tables do not tell us very much - although it may be enough for the
question of the moment.
Complex Tabulation
This deals with two or more aspects of a problem at the same time. In the problem just
studied, it is very likely that the two questions would be asked at the same time, and we could
present the answers in a complex table or matrix.
Page 77
Table 3.5
Note *140 - 159.99 is the same as "140 but less than 160" and similarly for the other
This table is much more informative than are the two simple tables, but it is more
complicated. We could have divided the groups further into, say, male and female workers, or
into age groups. In a later part of this study unit I will give you a list of the rules you should
try to follow in compiling statistical tables, and at the end of that list you will find a table
relating to our 3,000 workers, which you should study as you read the rules.
Page 78
Page 79
So far, our tables have merely classified the already available figures, the primary statistics,
but we can go further than this and do some simple calculations to produce other figures,
secondary statistics. As an example, take the first simple table illustrated above, and calculate
how many employees there are on average per workshop. This is obtained by dividing the
total (3,000) by the number of shops (5), and the table appears thus:
Table 3.6
This average is a "secondary statistic". For another example, we may take the second simple
table given above and calculate the proportion of workers in each wage group, thus:
Table 3.7
Page 80
These proportions are "secondary statistics". In commercial and business statistics, it is more
usual to use percentages than proportions; in the above tables these would be 3.5%, 17%,
30.7%, 33.8%, 10% and 5%.
Secondary statistics are not, of course, confined to simple tables, they are used in complex
tables too, as in this example:
The percentage columns and the average line show secondary statistics. All the other figures
are primary statistics.
Note carefully that percentages cannot be added or averaged to get the percentage of a
total or of an average. You must work out such percentages on the totals or averages
Another danger in the use of percentages has to be watched, and that is that you must not
forget the size of the original numbers. Take, for example, the case of two doctors dealing
with a certain disease. One doctor has only one patient and he cures him - 100% success! The
other doctor has 100 patients of whom he cures 80 - only 80% success! You can see how very
unfair it would be on the hard-working second doctor to compare the percentages alone.
Table 3.8: Inspection Results for a Factory
Product in Two Successive Years
Page 81
The Rules
There are no absolute rules for drawing up statistical tables, but there are a few general
principles which, if borne in mind, will help you to present your data in the best possible way.
Here they are:
a) Try not to include too many features in any one table (say, not more than four or five)
as otherwise it becomes rather clumsy. It is better to use two or more separate tables.
b) Each table should have a clear and concise title to indicate its purpose.
c) It should be very clear what units are being used in the table (tonnes, RWF, people,
RWF000, etc.).
d) Blank spaces and long numbers should be avoided, the latter by a sensible degree of
e) Columns should be numbered to facilitate reference.
f) Try to have some order to the table, using, for example, size, time, geographical
location or alphabetical order.
g) Figures to be compared or contrasted should be placed as close together as possible.
h) Percentages should be pleased near to the numbers on which they are based.
i) Rule the tables neatly - scribbled tables with freehand lines nearly always result in
mistakes and are difficulty to follow. However, it is useful to draw a rough sketch first
so that you can choose the best layout and decide on the widths of the columns.
j) Insert totals where these are meaningful, but avoid "nonsense totals". Ask yourself
what the total will tell you before you decide to include it. An example of such a
"nonsense total" is given in the following table:
Table 3.9 : Election Results
Page 82
The totals (470) at the foot of the two columns make sense because they tell us the total
number of seats being contested, but the totals in the final column (550, 390, 940) are
"nonsense totals" for they tell us nothing of value.
k) If numbers need to be totalled, try to place them in a column rather than along a row
for easier computation.
l) If you need to emphasise particular numbers, then underlining, significant spacing or
heavy type can be used. If data is lacking in a particular instance, then insert an
asterisk (*) in the empty space and give the reasons for the lack of data in a footnote.
m) Footnotes can also be used to indicate, for example, the source of secondary data, a
change in the way the data has been recorded, or any special circumstances which
make the data seem odd.
An Example of Tabulation
It is not always possible to obey all of these rules on any one occasion, and there may be
times when you have a good reason for disregarding some of them. But only do so if the
reason is really good - not just to save you the bother of thinking! Study now the layout of the
following table (based on our previous example of 3,000 workpeople) and check through the
list of rules to see how they have been applied.
Table 3.10: ABC & Co. Wage Structure of Labour Force Numbers
of Persons in Specified Categories
Page 83
Note (a) Total no. employed in workshop as a percentage of the total workforce.
Note (b) Total no. in wage group as a percentage of the total workforce.
Table 3.10 can be called a "twofold" table as the workforce is broken down by wage and
Page 84
Page 85
Sources, nature, application and use:
Data is generally found through research or as the result of a survey. Data which is found
from a survey is called primary data; it is data which is collected for a particular reason or
research project. For example, if your firm wished to establish how much money tourists
spend on cultural events when they come to Rwanda or how long a particular process takes
on average to complete in a factory. In this case the data will be taken in raw form, i.e. lots of
figures and then analysed by grouping the data into more manageable groups. The other
source of data is secondary data. This is data which is already available (government
statistics, company reports etc). As a business person you can take these figures and use them
for whatever purpose you require.
Nature of data.
Data is classified according to the type of data it is. The classifications are as follows:
Categorical data: example: Do you currently own any stocks or bonds? Yes No
This type of data is generally plotted using a bar chart or pie chart.
Numerical data: This is usually divided into discrete or continuous data.
How many cars do you own? This is discrete data. This is data that arises from a counting
How tall are you? This is continuous data. This is data that arises from a measuring process.
Or the figures cannot be measured precisely. For example: clock in times of the workers in a
particular shift: 8:23; 8:14; 8:16....
Whether data is discrete or continuous will determine the most appropriate method of
Page 86
Precaution in use.
As a business person it is important that you are cautions when reading data and statistics. In
order to draw intelligent and logical conclusions from data you need to understand the
various meanings of statistical terms.
Role of statistics in business analysis and decision making.
In the business world, statistics has four important applications:
To summarise business data
To draw conclusions from that data
To make reliable forecasts about business activities
To improve business processes.
The field of statistics is generally divided into two areas.
Figure 3.1
Descriptive statistics allows
you to create different tables
and charts to summarise data.
It also provides statistical
measures such as the mean,
median, mode, standard
deviation etc to describe
different characteristics of the
Page 87
Figure 3.2
Improving business processes involves using managerial approaches that focus on quality
improvements such as Six Sigma. These approaches are data driven and use statistical
method to develop these models.
Presentation of data, use of bar charts, histograms, pie charts, graphs, tables,
frequency distributions, cumulative distributions, Ogives.
Their uses and interpretations.
If you look at any magazine or newspaper article, TV show, election campaign etc you will
see many different charts depicting anything from the most popular holiday destination to the
gain in company profits. The nice thing about studying statistics is that once you understand
the concepts the theory remains the same for all situations and you can easily apply your
knowledge to whatever situation you are in.
Tables and charts for categorical data:
When you have categorical data, you tally responses into categories and then present the
frequency or percentage in each category in tables and charts.
The summary table indicates the frequency, amount or percentage of items in each category,
so that you can differentiate between the categories.
Supposing a questionnaire asked people how they preferred to do their banking:
Drawing conclusions about
your data is the fundamental
point of inferential statistics.
Using these methods allows the
researcher to draw conclusions
based on data rather than on
Page 88
Table 3.11
Banking preference
In bank
The above information could be illustrated using a bar chart
Figure 3.3
In bank ATM Telephone Internet
Page 89
Or a pie chart
Figure 3.4
A simple line chart is usually used for time series data, where data is given over time.
The price of an average mobile homes over the past 3 years
Table 3.12
Price RWF
RWF350 000
RWF252 000
RWF190 000
In bank
Page 90
Raw data Grouped
using histogram,
Figure 3.5
The above graphs are used for categorical data.
Numerical Data
Numerical data is generally used more in statistics. The process in which numerical data is
processed is as follows.
Figure 3.6
2008 2009 2010
Page 91
The Histogram:
The histogram is like a bar chart but for numerical data. The important thing to remember
about the histogram is that the area under the histogram represents or is proportionate to the
frequencies. If you are drawing a histogram for data where the class widths are all the same
then it is very easy. If however one class width is bigger or narrower than the others an
adjustment must be made to ensure that the area of the bar is proportionate to the frequency.
Page 92
Page 93
Graphical Representation of Information
Introduction to Frequency Distributions
Preparation of Frequency Distributions
Simple Frequency Distribution
Grouped Frequency Distribution
Choice of Class Interval
Cumulative Frequency Distributions
Relative Frequency Distributions
Graphical Representation of Frequency Distributions
Frequency Dot Diagram
Frequency Bar Chart
Frequency Polygon
The Ogive
Introduction to Other Types of Data Presentation
Limited Form
Accurate Form
Pie Charts
Bar Charts
Component Bar Chart
Horizontal Bar Chart
General Rules for Graphical Presentation
Page 94
The Lorenz Curve
Stages in Construction of a Lorenz Curve
Interpretation of the Curve
Other Uses
Page 95
A frequency distribution is a tabulation which shows the number of times (i.e. the frequency)
each different value occurs. Refer back to Study Unit 2 and make sure you understand the
difference between "attributes" (or qualitative variables) and "variables" (or quantitative
variables); the term "frequency distribution" is usually confined to the case of variables.
The following figures are the times (in minutes) taken by a shop-floor worker to perform a
given repetitive task on 20 specified occasions during the working day:
3.5 3.8 3.8 3.4 3.6
3.6 3.8 3.9 3.7 3.5
3.4 3.7 3.6 3.8 3.6
3.7 3.7 3.7 3.5 3.9
If we now assemble and tabulate these figures, we obtain a frequency distribution (see Table
Table 4.1
Page 96
Page 97
Simple Frequency Distribution
A useful way of preparing a frequency distribution from raw data is to go through the records
as they stand and mark off the items by the "tally mark" or "five-bar gate" method. First look
at the figures to see the highest and lowest values so as to decide the range to be covered and
then prepare a blank table.
Now mark the items on your table by means of a tally mark. To illustrate the procedure, the
following table shows the state of the work after all 20 items have been entered.
Table 4.2
Grouped Frequency Distribution
Sometimes the data is so extensive that a simple frequency distribution is too cumbersome
and, perhaps, uninformative. Then we make use of a "grouped frequency distribution".
In this case, the "length of time" column consists not of separate values but of groups of
values (see Table 4.3).
Page 98
Table 4.3
Grouped frequency distributions are only needed when there is a large number of values and,
in practice, would not have been required for the small amount of data in our example. Table
4.4 shows a grouped frequency distribution used in a more realistic situation, when an
ungrouped table would not have been of much use.
The various groups (e.g. "25 but less than 30") are called "classes" and the range of values
covered by a class (e.g. five years in this example) is called the "class interval".
The number of items in each class (e.g. 28 in the 25 to 30 class) is called the "class
frequency" and the total number of items (in this example, 220) is called the "total
frequency". As stated before, frequency distributions are usually only considered in
Table 4.4: Age Distribution of Workers in an
Page 99
connection with variables and not with attributes, and you will sometimes come across the
term "variate" used to mean the variable in a frequency distribution. The variate in our last
example is "age of worker", and in the previous example the variate was "length of time".
The term "class boundary" is used to denote the dividing line between adjacent classes, so in
the age group example the class boundaries are 15, 20, 25, .... years. In the length of time
example, as grouped earlier in this section, the class boundaries are 3.35, 3.55, 3.75, 3.95
minutes. This needs some explanation. As the original readings were given correct to one
decimal place, we assume that is the precision to which they were measured. If we had had a
more precise stopwatch, the times could have been measured more precisely. In the first
group of 3.4 to 3.5 are put times which could in fact be anywhere between 3.35 and 3.55 if
we had been able to measure them more precisely. A time such as 3.57 minutes would not
have been in this group as it equals 3.6 minutes when corrected to one decimal place and it
goes in the 3.6 to 3.7 group.
Another term, "class limits", is used to stand for the lowest and highest values that can
actually occur in a class. In the age group example, these would be 15 years and 19 years 364
days for the first class, 20 years and 24 years 364 days for the second class and so on,
assuming that the ages were measured correct to the nearest day below. In the length of time
example, the class limits are 3.4 and 3.5 minutes for the first class and 3.6 and 3.7 minutes for
the second class.
You should make yourself quite familiar with these terms, and with others which we will
encounter later, because they are all used freely by examiners and you will not be able to
answer questions if you don’t know what the questioner means!
Choice of Class Interval
When compiling a frequency distribution you should, if possible, make the length of the class
interval equal for all classes so that fair comparison can be made between one class and
another. Sometimes, however, this rule has to be broken (official publications often lump
together the last few classes into one so as to save paper and printing costs) and then, before
we use the information, it is as well to make the classes comparable by calculating a column
showing "frequency per interval of so much", as in this example for some wage statistics:
Page 100
Notice that the intervals in the first column are:
200, 200, 400, 400, 400, 800.
These intervals let you see how the last column was compiled.
A superficial look at the original table (first two columns only) might have suggested that the
most frequent incomes were at the middle of the scale, because of the appearance of the
figure 55,000. But this apparent preponderance of the middle class is due solely to the change
in the length of the class interval, and column three shows that, in fact, the most frequent
incomes are at the bottom end of the scale, i.e. the top of the table.
You should remember that the purpose of compiling a grouped frequency distribution is to
make sense of an otherwise troublesome mass of figures. It follows, therefore, that we do not
want to have too many groups or we will be little better off; nor do we want too few groups
or we will fail to see the significant features of the distribution. As a practical guide, you will
find that somewhere between about five and 20 groups will usually be suitable.
When compiling grouped frequency distributions, we occasionally run into trouble because
some of our values lie exactly on the dividing line between two classes and we wonder which
class to put them into. For example, in the age distribution given earlier in Table 24, if we
Table 4.5
Page 101
have someone aged exactly 40 years, do we put him into the "35-40" group or into the "40-
45" group? There are two possible solutions to this problem:
a) Describe the classes as "x but less than y" as we have done in Table 24, and then there
can be no doubt.
b) Where an observation falls exactly on a class boundary, allocate half an item to each
of the adjacent classes. This may result in some frequencies having half units, but this
is not a serious drawback in practice.
The first of these two procedures is the one to be preferred.
Page 102
Page 103
Very often we are not especially interested in the separate class frequencies, but in the
number of items above or below a certain value. When this is the case, we form a cumulative
frequency distribution as illustrated in column three of the following table:
The cumulative frequency tells us the number of items equal to or less than the specified
value, and it is formed by the successive addition of the separate frequencies. A cumulative
frequency column may also be formed for a grouped distribution.
The above example gives us the number of items "less than" a certain amount, but we may
wish to know, for example, the number of persons having more than some quantity. This can
easily be done by doing the cumulative additions from the bottom of the table instead of the
top, and as an exercise you should now compile the "more than" cumulative frequency
column in the above example.
Table 4.6
Page 104
Page 105
All the frequency distributions which we have looked at so far in this study unit have had
their class frequencies expressed simply as numbers of items. However, remember that
proportions or percentages are useful secondary statistics. When the frequency in each class
of a frequency distribution is given as a proportion or percentage of the total frequency, the
result is known as a "relative frequency distribution" and the separate proportions or
percentages are the "relative frequencies". The total relative frequency is, of course, always
1.0 (or 100%). Cumulative relative frequency distributions may be compiled in the same way
as ordinary cumulative frequency distributions.
As an example, the distribution used in Table 4.5 is now set out as a relative frequency
distribution for you to study.
This example is in the "less than" form, and you should now compile the "more than" form in
the same way as you did for the non-relative distribution.
Table 4.7
Page 106
Page 107
Tabulated frequency distributions are sometimes more readily understood if represented by a
diagram. Graphs and charts are normally much superior to tables (especially lengthy complex
tables) for showing general states and trends, but they cannot usually be used for accurate
analysis of data. The methods of presenting frequency distributions graphically are as
Frequency dot diagram
Frequency bar chart
Frequency polygon
We will now examine each of these in turn.
Frequency Dot Diagram
This is a simple form of graphical representation for the frequency distribution of a discrete
variate. A horizontal scale is used for the variate and a vertical scale for the frequency. Above
each value on the variate scale we mark a dot for each occasion on which that value occurs.
Thus, a frequency dot diagram of the distribution of times taken to complete a given task,
which we have used in this study unit, would look like Figure 4.1.
Page 108
Frequency Bar Chart
We can avoid the business of marking every dot in such a diagram by drawing instead a
vertical line the length of which represents the number of dots which should be there. The
frequency dot diagram in Figure 4.1 now becomes a frequency bar chart, as in Figure 4.2.
Figure 4.1: Frequency Dot Diagram to Show Length of Time Taken
by Operator to Complete a Given Task
Figure 4.2: Frequency Bar Chart
Page 109
Frequency Polygon
Instead of drawing vertical bars as we do for a frequency bar chart, we could merely mark the
position of the top end of each bar and then join up these points with straight lines. When we
do this, the result is a frequency polygon, as in Figure 4.3.
Note that we have added two fictitious classes at each end of the distribution, i.e. we have
marked in groups with zero frequency at 3.3 and 4.0.
This is done to ensure that the area enclosed by the polygon and the horizontal axis is the
same as the area under the corresponding histogram which we shall consider in the next
These three kinds of diagram are all commonly used as a means of making frequency
distributions more readily comprehensible. They are mostly used in those cases where the
variate is discrete and where the values are not grouped. Sometimes frequency bar charts
and polygons are used with grouped data by drawing the vertical line (or marking its top end)
at the centre point of the group.
Figure 4.3: Frequency Polygon
Page 110
This is the best way of graphing a grouped frequency distribution. It is of great practical
importance and is also a favourite topic among examiners. Refer back now to the grouped
distribution given earlier in Table 4.4 (ages of office workers) and then study Figure 4.5.
Figure 4.5: Histogram
Page 111
We call this kind of diagram a "histogram". The frequency in each group is represented by a
rectangle and - this is a very important point - it is the AREA of the rectangle, not its
height, which represents the frequency.
When the lengths of the class intervals are all equal, then the heights of the rectangles
represent the frequencies in the same way as do the areas (this is why the vertical scale has
been marked in this diagram); if, however, the lengths of the class intervals are not all equal,
you must remember that the heights of the rectangles have to be adjusted to give the correct
areas. Do not stop at this point if you have not quite grasped the idea, because it will become
clearer as you read on.
Look once again at the histogram of ages given in Figure 4.5 and note particularly how it
illustrates the fact that the frequency falls off towards the higher age groups - any form of
graph which did not reveal this fact would be misleading. Now let us imagine that the
original table had NOT used equal class intervals but, for some reason or other, had given the
last few groups as:
The last two groups have been lumped together as one. A WRONG form of histogram, using
heights instead of areas, would look like Figure 4.6.
Table 4.8
Page 112
Now, this clearly gives an entirely wrong impression of the distribution with respect to the
higher age groups. In the correct form of the histogram, the height of the last group (50-60)
would be halved because the class interval is double all the other class intervals. The
histogram in Figure 4.7 gives the right impression of the falling off of frequency in the higher
age groups. I have labelled the vertical axis "Frequency density per 5-year interval" as five
years is the "standard" interval on which we have based the heights of our rectangles.
Figure 4.6
Page 113
Often it happens, in published statistics, that the last group in a frequency table is not
completely specified. The last few groups may look as in Table 4.9:
Figure 4.7
Table 4.9
Page 114
How do we draw the last group on the histogram?
If the last group has a very small frequency compared with the total frequency (say, less than
about 1% or 2%) then nothing much is lost by leaving it off the histogram altogether. If the
last group has a larger frequency than about 1% or 2%, then you should try to judge from the
general shape of the histogram how many class intervals to spread the last frequency over in
order not to create a false impression of the extent of the distribution. In the example given,
you would probably spread the last 30 people over two or three class intervals but it is often
simpler to assume that an open-ended class has the same length as its neighbour. Whatever
procedure you adopt, the important thing in an examination paper is to state clearly what you
have done and why. A distribution of the kind we have just discussed is called an "open-
ended" distribution.
The Ogive
This is the name given to the graph of the cumulative frequency. It can be drawn in either the
"less than" or the "or more" form, but the "less than" form is the usual one. Ogives for two of
the distributions already considered in this study unit are now given as examples; Figure 4.8
is for ungrouped data and Figure 4.9 is for grouped data.
Study these two diagrams so that you are quite sure that you know how to draw them. There
is only one point which you might be tempted to overlook in the case of the grouped
distribution - the points are plotted at the ends of the class intervals and NOT at the centre
point. Look at the example and see how the 168,000 is plotted against the upper end of the
56-60 group and not against the mid-point, 58. If we had been plotting an "or more" ogive,
the plotting would have to have been against the lower end of the group.
Page 115
As an example of an "or more" ogive, we will compile the cumulative frequency of our
example from Section B, which for convenience is repeated below with the "more than"
cumulative frequency:
Figure 4.8
Figure 4.9
Table 4.10
Page 116
Check that you see how the plotting has been made against the lower end of the group and
notice how the ogive has a reversed shape.
In each of Figures 4.9 and 4.10 we have added a fictitious group of zero frequency at one end
of the distribution.
It is common practice to call the cumulative frequency graph a cumulative frequency polygon
if the points are joined by straight lines, and a cumulative frequency curve if the points are
joined by a smooth curve.
(N.B. Unless you are told otherwise, always compile a "less than" cumulative frequency.)
All of these diagrams, of course, may be drawn from the original figures or on the basis of
relative frequencies. In more advanced statistical work the latter are used almost exclusively
and you should practise using relative frequencies whenever possible.
Figure 4.10
The ogive now appears as shown in Figure 4.10
Page 117
The graphs we have seen so far in this study unit are all based on frequency distributions.
Next we shall discuss several common graphical presentations that are designed more for the
lay reader than someone with statistical knowledge. You will certainly have seen some
examples of them used in the mass media of newspapers and television.
Page 118
Page 119
This is the simplest method of presenting information visually. These diagrams are variously
called "pictograms", "ideograms", "picturegrams" or "isotypes" - the words all refer to the
same thing. Their use is confined to the simplified presentation of statistical data for the
general public. Pictograms consist of simple pictures which represent quantities. There are
two types and these are illustrated in the following examples. The data we will use is shown
in Table 4.11.
Table 4.11: Cruises Organised by a Shipping Line Between Year
1 and Year 3
Page 120
Limited Form
a) We could represent the number of cruises by ships of varying size, as in Figure 4.11.
b) Although these diagrams show that the number of cruises has increased each year,
they can give false impressions of the actual increases. The reader can become
confused as to whether the quantity is represented by the length or height of the
pictograms, their area on the paper, or the volume of the object they represent. It is
difficult to judge what increase has taken place. Sometimes you will find pictograms
in which the sizes shown are actually WRONG in relation to the real increases. To
avoid confusion, I recommend that you use the style of diagram shown in Figure 4.12.
Figure 4.11: Number of Cruises Years 1-3
(Source: Table 4.11)
Page 121
Accurate Form
Each matchstick man is the same height and represents 20,000 passengers, so there can be no
confusion over size.
These diagrams have no purpose other than generally presenting statistics in a simple way.
Look at Figure 4.13.
Figure 4.12: Passengers Carried Years 1-3
(Source: Table 4.11)
Figure 4.13: Imports of Crude Oil
Page 122
Here it is difficult to represent a quantity less than 10m barrels, e.g. does "[" represent 0.2m
or 0.3m barrels?
Page 123
These diagrams, known also as circular diagrams, are used to show the manner in which
various components add up to a total. Like pictograms, they are only used to display very
simple information to non-expert readers. They are popular in computer graphics.
An example will show what the pie chart is. Suppose that we wish to illustrate the sales of
gas in Rwanda in a certain year. The figures are shown in Table 4.12.
The figures are illustrated in the pie chart or circular diagram in Figure 4.14.
Table 4.12: Gas Sales in Rwanda in One Year
Figure 4.14: Example of a Pie Chart (Gas Sales in Rwanda)
(Source: Table 4.12)
Page 124
c) Construct the diagram by means of a pair of compasses and a protractor. Don’t
overlook this point, because examiners dislike inaccurate and roughly drawn
d) Label the diagram clearly, using a separate "legend" or "key" if necessary. (A key is
illustrated in Figure 21.)
e) If you have the choice, don’t use a diagram of this kind with more than four or five
component parts.
Note: The actual number of therms can be inserted on each sector as it is not possible to read
this exactly from the diagram itself.
The main use of a pie chart is to show the relationship each component part bears to the
whole. They are sometimes used side by side to provide comparisons, but this is not really to
be recommended, unless the whole diagram in each case represents exactly the same total
amount, as other diagrams (such as bar charts, which we discuss next) are much clearer.
However, in examinations you may be asked specifically to prepare such pie charts.
Page 125
We have already met one kind of bar chart in the course of our studies of frequency
distributions, namely the frequency bar chart. A "bar" is simply another name for a thick line.
In a frequency bar chart the bars represent, by their length, the frequencies of different values
of the variate. The idea of a bar chart can, however, be extended beyond the field of
frequency distributions, and we will now illustrate a number of the types of bar chart in
common use. I say "illustrate" because there are no rigid and fixed types, but only general
ideas which are best studied by means of examples. You can supplement the examples in this
study unit by looking at the commercial pages of newspapers and magazines.
Component Bar Chart
This first type of bar chart serves the same purpose as a circular diagram and, for that reason,
is sometimes called a "component bar diagram" (see Figure 4.15).
Figure 4.15: Component Bar Chart Showing Cost
of Production of ZYX Co. Ltd
Page 126
Note that the lengths of the components represent the amounts, and that the components are
drawn in the same order so as to facilitate comparison. These bar charts are preferable to
circular diagrams because:
a) They are easily read, even when there are many components.
b) They are more easily drawn.
c) It is easier to compare several bars side by side than several circles.
Bar charts with vertical bars are sometimes called "column charts" to distinguish them from
those in which the bars are horizontal (see Figure 4.16).
Figure 4.16 is also an example of a percentage component bar chart, i.e. the information is
expressed in percentages rather than in actual numbers of visitors.
If you compare several percentage component bar charts, you must be careful. Each bar chart
will be the same length, as they each represent 100%, but they will not necessarily represent
the same actual quantities, e.g. 50% might have been 1 million, whereas in another year it
may have been nearer to 4 million and in another to 8 million.
Figure 4.16: Horizontal Bar Chart of Visitors
Arriving in Rwanda in One Year
Page 127
Horizontal Bar Chart
A typical case of presentation by a horizontal bar chart is shown in Figure 4.17. Note how a
loss is shown by drawing the bar on the other side of the zero line.
Pie charts and bar charts are especially useful for "categorical" variables as well as for
numerical variables. The example in Figure 4.17 shows a categorical variable, i.e. the
different branches form the different categories, whereas in Figure 4.15 we have a numerical
variable, namely, time. Figure 4.17 is also an example of a multiple or compound bar chart as
there is more than one bar for each category.
Figure 4.17: Horizontal Bar Chart for the So and So Company Ltd to
Show Profits Made by Branches in Year 1 and Year 2
Page 128
Page 129
There are a number of general rules which must be borne in mind when planning and using
graphical methods:
a) Graphs and charts must be given clear but brief titles.
b) The axes of graphs must be clearly labelled, and the scales of values clearly marked.
c) Diagrams should be accompanied by the original data, or at least by a reference to the
source of the data.
d) Avoid excessive detail, as this defeats the object of diagrams.
e) Wherever necessary, guidelines should be inserted to facilitate reading.
f) Try to include the origins of scales. Obeying this rule sometimes leads to rather a
waste of paper space. In such a case the graph could be "broken" as shown in Figure
4.18, but take care not to distort the graph by over-emphasising small variations.
Figure 4.18
Page 130
Page 131
One of the problems which frequently confronts the statistician working in economics or
industry is that of CONCENTRATION. Suppose that, in a business employing 100 men, the
total weekly wages bill is RWF10,000 and that every one of the 100 men gets RWF100; there
is then an equal distribution of wages and there is no concentration. Suppose now that, in
another business employing 100 men and having a total weekly wages bill of RWF10,000,
there are 12 highly skilled experts getting RWF320 each and 88 unskilled workers getting
RWF70 each. The wages are not now equally distributed and there is some concentration of
wages in the hands of the skilled experts. These experts number 12 out of 100 people (i.e.
they constitute 12% of the labour force); their share of the total wages bill is 12 x RWF320
(i.e. RWF3,840) out of RWF10,000, which is 38.4%. We can therefore say that 38.4% of the
firm’s wages is concentrated in the hands of only 12% of its employees.
In the example just discussed there were only two groups, the skilled and the unskilled. In a
more realistic case, however, there would be a larger number of groups of people with
different wages, as in the following example:
Wages Group (RWF) Number of People Total Wages (RWF)
0 - 80 205 10,250
80 - 120 200 22,000
120 - 160 35 4,900
160 - 200 30 5,700
200 - 240 20 4,400
240 - 280 10 2,500
500 49,750
Page 132
Obviously when we have such a set of figures, the best way to present them is to graph them,
which I have done in Figure 4.19. Such a graph is called a LORENZ CURVE. (The next
section shows how we obtain this graph.)
Figure 4.19: Lorenz Curve
Page 133
Stages in Construction of a Lorenz Curve
a) Draw up a table giving:
(i) the cumulative frequency;
(ii) the percentage cumulative frequency;
(iii)the cumulative wages total;
(iv) the percentage cumulative wages total.
b) On graph paper draw scales of 0-100% on both the horizontal and vertical axes. The
scales should be the same on both axes.
c) Plot the cumulative percentage frequency against the cumulative percentage wages
total and join up the points with a smooth curve. Remember that 0% of the employees
earn 0% of the total wages so that the curve will always go through the origin.
d) Draw in the 45∞ diagonal. Note that, if the wages had been equally distributed, i.e.
50% of the people had earned 50% of the total wages, etc., the Lorenz curve would
have been this diagonal line.
The graph is shown in Figure 4.19.
Table 4.13
Page 134
Sometimes you will be given the wages bill as a grouped frequency distribution alone,
without the total wages for each group being specified. Consider the following set of figures:
Wages Group (RWF) No. of People
0 - 40 600
40 - 80 250
80 - 120 100
120 - 160 30
160 - 200 20
As we do not know the actual wage of each person, the total amount of money involved in
each group is estimated by multiplying the number of people in the group by the mid-value of
the group; for example, the total amount of money in the "RWF40-RWF80" group is 250 x
RWF60 = RWF15,000. The construction of the table and the Lorenz curve then follows as
before. Try working out the percentages for yourself first and then check your answers with
the following table. Your graph should look like Figure 4.20.
Table 4.14
Page 135
Interpretation of the Curve
From Figure 4.20 we can read directly the share of the wages paid to any given percentage of
a) 50% of the employees earn 22% of the total wages, so we can deduce that the other
50%, i.e. the more highly paid employees, earn 78% of the total wages.
b) 90% of the employees earn 70% of the total wages, so 10% of the employees must
earn 30% of the total wages.
c) 95% of the employees earn 83% of the total wages, so 5% of the employees earn 17%
of the total wages.
Figure 4.20: Lorenz Curve
Page 136
Other Uses
Although usually used to show the concentration of wealth (incomes, property ownership,
etc.), Lorenz curves can also be employed to show concentration of any other feature. For
example, the largest proportion of a country’s output of a particular commodity may be
produced by only a small proportion of the total number of factories, and this fact can be
illustrated by a Lorenz curve.
Concentration of wealth or productivity, etc. may become more or less as time goes on. A
series of Lorenz curves on one graph will show up such a state of affairs. In some countries,
in recent years, there has been a tendency for incomes to be more equally distributed. A
Lorenz curve reveals this because the curves for successive years lie nearer to the straight
Page 137
Averages or Measures of Location
The Need for Measures of Location
The Arithmetic Mean
The Mean of a Simple Frequency Distribution
The Mean of a Grouped Frequency Distribution
Simplified Calculation
Characteristics of the Arithmetic Mean
The Mode
Mode of a Simple Frequency Distribution
Mode of a Grouped Frequency Distribution
Characteristics of the Mode
The Median
Median of a Simple Frequency Distribution
Median of a Grouped Frequency Distribution
Characteristics of the Median
Page 138
Page 139
We looked at frequency distributions in detail in the previous study unit and you should, by
means of a quick revision, make sure that you have understood them before proceeding.
A frequency distribution may be used to give us concise information about its variate, but
more often, we will wish to compare two or more distributions. Consider, for example, the
distribution of the weights of eggs from two different breeds of poultry (which is a topic in
which you would be interested if you were the statistician in an egg marketing company).
Having weighed a large number of eggs from each breed, we would have compiled frequency
distributions and graphed the results. The two frequency polygons might well look something
like Figure 5.1.
Examining these distributions you will see that they look alike except for one thing - they are
located on different parts of the scale. In this case the distributions overlap and, although
some eggs from Breed A are of less weight than some eggs from Breed B, eggs from Breed A
are, in general, heavier than those from Breed B.
Figure 5.1
Page 140
Remember that one of the objects of statistical analysis is to condense unwieldy data so as to
make it more readily understood. The drawing of frequency curves has enabled us to make an
important general statement concerning the relative egg weights of the two breeds of poultry,
but we would now like to take the matter further and calculate some figure which will serve
to indicate the general level of the variable under discussion. In everyday life we commonly
use such a figure when we talk about the "average" value of something or other. We might
have said, in reference to the two kinds of egg, that those from Breed A had a higher average
weight than those from Breed B. Distributions with different averages indicate that there is a
different general level of the variate in the two groups. The single value which we use to
describe the general level of the variate is called a "measure of location" or a "measure of
central tendency" or, more commonly, an average.
There are three such measures with which you need to be familiar:
The arithmetic mean
The mode
The median.
Page 141
This is what we normally think of as the "average" of a set of values. It is obtained by adding
together all the values and then dividing the total by the number of values involved. Take, for
example, the following set of values which are the heights, in inches, of seven men:
Man Height (ins)
A 74
B 63
C 64
D 71
E 71
F 66
G 74
Total 483
The arithmetic mean of these heights is 483 ÷ 7 = 69 ins. Notice that some values occur more
than once, but we still add them all.
At this point we must introduce a little algebra. We don’t always want to specify what
particular items we are discussing (heights, egg weights, wages, etc.) and so, for general
discussion, we use, as you will recall from algebra, some general letter, usually x. Also, we
indicate the sum of a number of x’s by Σ (sigma).
Thus, in our example, we may write:
Σx = 483
Page 142
We indicate the arithmetic mean by the symbol
(called "x bar") and the number of items
by the letter n. The calculation of the arithmetic mean can be described by formula thus:
The last one is customary in statistical work. Applying it to the example above, we have:
You will often find the arithmetic mean simply referred to as "the mean" when there is no
chance of confusion with other means (which we are not concerned with here).
The Mean of a Simple Frequency Distribution
When there are many items (i.e. when n is large) the arithmetic can be eased somewhat by
forming a frequency distribution, like this:
Table 5.1
Page 143
Table 5.2
Indicating the frequency of each value by the letter f, you can see that Sf = n and that, when
the x’s are not all the separate values but only the different ones, the formula becomes:
Of course, with only seven items it would not be necessary, in practice, to use this method,
but if we had a much larger number of items the method would save a lot of additions.
a) Consider now Table 5.2. Complete the (fx) column and calculate the value of the
arithmetic mean,
Page 144
You should have obtained the following answers:
The total number of items, ∑f = 100
The total product, ∑(fx) = 713
The arithmetic mean,
Make sure that you understand this study unit so far. Revise it if necessary, before going on
to the next paragraph. It is most important that you do not get muddled about calculating
arithmetic means.
The Mean of a Grouped Frequency Distribution
Suppose now that you have a grouped frequency distribution. In this case, you will
remember, we do not know the actual individual values, only the groups in which they lie.
How, then, can we calculate the arithmetic mean? The answer is that we cannot calculate the
exact value of
, but we can make an approximation sufficiently accurate for most statistical
purposes. We do this by assuming that all the values in any group are equal to the mid-point
of that group.
The procedure is very similar to that for a simple frequency distribution (which is why I
Provided that Σf is not less than about 50 and that the number of groups is not less than about
12, the arithmetic mean thus calculated is sufficiently accurate for all practical purposes.
Table 5.3
0 ‹ 10
10 ‹ 20
20 ‹ 30
30 ‹ 40
40 ‹ 50
50 ‹ 60
60 ‹ 70
Page 145
There is one pitfall to be avoided when using this method; if all the groups should not have
the same class interval, be sure that you get the correct mid-values! The following is part of a
table with varying class intervals, to illustrate the point:
You will remember that in discussing the drawing of histograms we had to deal with the case
where the last group was not exactly specified. The same rules for drawing the histogram
apply to the calculation of the arithmetic mean.
Simplified Calculation
It is possible to simplify the arithmetic still further by the following two devices:
a) Work from an assumed mean in the middle of one convenient class.
b) Work in class intervals instead of in the original units.
Let us consider device (a). If you go back to our earlier examples you will discover after
some arithmetic that if you add up the differences in value between each reading and the true
mean, then these differences add up to zero.
Mid Value (x)
Table 5.4
Page 146
Take first the height distribution discussed at the start of Section B:
= 69 ins
Secondly, consider the grouped frequency distribution given earlier in this section:
= 33.2
Table 5.5
Table 5.6
0 ‹ 10
10 ‹ 20
20 ‹ 30
30 ‹ 40
40 ‹ 50
50 ‹ 60
60 ‹ 70
Page 147
If we take any value other than
and follow the same procedure, the sum of the differences
(sometimes called deviations) will not be zero. In our first example, let us assume the mean to
be 68 ins and label the assumed mean xo. The differences between each reading and this
assumed value are:
We make use of this property and we use this method as a "short-cut" for finding
. Firstly,
we have to choose some value of x as an assumed mean. We try to choose it near to where
we think the true mean, x, will lie, and we always choose it as the mid-point of one of the
groups when we are involved with a grouped frequency distribution. In the above example,
the total deviation, d, does not equal zero, so 68 cannot be the true mean. As the total
deviation is positive, we must have UNDERESTIMATED in our choice of xo, so the true
mean is higher than 68. As there are seven readings, we need to adjust xo upwards by one
seventh of the total deviation, i.e. by (+7)/7 = +1. Therefore the true value of
We know this to be the correct answer from our earlier work.
Let us now illustrate the "short-cut" method for the grouped frequency distribution. We shall
take xo as 35 as this is the mid-value in the centre of the distribution.
Table 5.7
Page 148
This time we must have OVERESTIMATED xo, as the total deviation, Σfd, is negative. As
there are 50 readings altogether, the true mean must be
th of the (-90) lower than 35, i.e.
which is as we found previously.
Device (b) can be used with a grouped frequency distribution to work in units of the class
interval instead of in the original units. In the fourth column of Table 43, you can see that all
the deviations are multiples of 10, so we could have worked in units of 10 throughout and
then compensated for this at the end of the calculation.
Let us repeat the calculation using this method. The result (with xo = 35) is:
Table 5.8
0 ‹ 10
10 ‹ 20
20 ‹ 30
30 ‹ 40
40 ‹ 50
50 ‹ 60
Page 149
The symbol used for the length of the class interval is c, but you may also come across the
symbol i used for this purpose.
Table 5.9
0 ‹ 10
10 ‹ 20
20 ‹ 30
30 ‹ 40
40 ‹ 50
50 ‹ 60
60 ‹ 70
Page 150
As we mentioned at an earlier stage, you have to be very careful if the class intervals are
unequal, because you can only use one such interval as your working unit. Table 5.10 shows
you how to deal with this situation.
The assumed mean is 35, as before, and the working unit is a class interval of 10. Notice how
d for the last group is worked out; the mid-point is 60, which is 21/2 times 10 above the
assumed mean. The required arithmetic mean is, therefore:
We have reached a slightly different figure from before because of the error introduced by the
coarser grouping in the "50-70" region.
The method just described is of great importance both in work day statistics and in
examinations. By using it correctly, you can often do the calculations for very complicated-
looking distributions by using mental arithmetic and pencil and paper.
With the advent of electronic calculators, the time saving on calculations of the arithmetic
mean is not great, but this method is still preferable because:
The numbers involved are smaller and thus you are less likely to make slips in
The method can be extended to enable us to find easily the standard deviation of a
frequency distribution.
Table 5.10
0 ‹ 10
10 ‹ 20
20 ‹ 30
30 ‹ 40
40 ‹ 50
50 ‹ 60
60 ‹ 70
Page 151
Characteristics of the Arithmetic Mean
There are a number of characteristics of the arithmetic mean which you must know and
understand. Apart from helping you to understand the topic more thoroughly, the following
are the points which an examiner expects to see when he or she asks for "brief notes" on the
arithmetic mean:
a) It is not necessary to know the value of every item in order to calculate the arithmetic
mean. Only the total and the number of items are needed. For example, if you know
the total wages bill and the number of employees, you can calculate the arithmetic
mean wage without knowing the wages of each person.
b) It is fully representative because it is based on all, and not only some, of the items in
the distribution.
c) One or two extreme values can make the arithmetic mean somewhat unreal by their
influence on it. For example, if a millionaire came to live in a country village, the
inclusion of his income in the arithmetic mean for the village would make the place
seem very much better off than it really was!
d) The arithmetic mean is reasonably easy to calculate and to understand.
e) In more advanced statistical work it has the advantage of being amenable to algebraic
Page 152
1) Table 5.11 shows the consumption of electricity of 100 householders during a
particular week. Calculate the arithmetic mean consumption of the 100 householders.
Table 5.11
Page 153
Mode of a Simple Frequency Distribution
The first alternative to the mean which we will discuss is the mode. This is the name given to
the most frequently occurring value. Look at the following frequency distribution:
In this case the most frequently occurring value is 1 (it occurred 39 times) and so the mode of
this distribution is 1. Note that the mode, like the mean, is a value of the variate, x, not the
frequency of that value. A common error is to say that the mode of the above distribution is
39. THIS IS WRONG. The mode is 1. Watch out, and do not fall into this trap!
For comparison, calculate the arithmetic mean of the distribution: it works out at 1.52. The
mode is used in those cases where it is essential for the measure of location to be an actually
occurring value. An example is the case of a survey carried out by a clothing store to
determine what size of garment to stock in the greatest quantity. Now, the average size of
garment in demand might turn out to be, let us say, 9.3724, which is not an actually occurring
value and doesn’t help us to answer our problem. However, the mode of the distribution
obtained from the survey would be an actual value (perhaps size 8) and it would provide the
answer to the problem.
Table 5.12
Page 154
Mode of a Grouped Frequency Distribution
When the data is given in the form of a grouped frequency distribution, it is not quite so
easy to determine the mode. What, you might ask, is the mode of the following distribution?
All we can really say is that "70 ‹ 80" is the modal group (the group with the largest
frequency). You may be tempted to say that the mode is 75, but this is not true, nor even a
useful approximation in most cases. The reason is that the modal group depends on the
method of grouping, which can be chosen quite arbitrarily to suit our convenience. The
distribution could have been set out with class intervals of five instead of 10, and would then
have appeared as follows (only the middle part is shown, to illustrate the point):
Table 5.13
Table 5.14
0 ‹ 10
10 ‹ 20
20 ‹ 30
30 ‹ 40
40 ‹ 50
50 ‹ 60
60 ‹ 70
70 ‹ 80
80 ‹ 90
90 ‹ 100
100 ‹ 110
110 ‹ 120
Page 155
The modal group is now "65-70". Likewise, we will get different modal groups if the
grouping is by 15 or by 20 or by any other class interval, and so the mid-point of the modal
group is not a good way of estimating the mode.
In practical work, this determination of the modal group is usually sufficient, but examination
papers occasionally ask for the mode to be determined from a grouped distribution.
A number of procedures based on the frequencies in the groups adjacent to the modal group
can be used, and I will now describe one procedure. You should note, however, that these
procedures are only mathematical devices for finding the MOST LIKELY position of the
mode; it is not possible to calculate an exact and true value in a grouped distribution.
We saw that the modal group of our original distribution was "70-80". Now examine the
groups on each side of the modal group; the group below (i.e. 60-70) has a frequency of 38,
and the one above (i.e. 80-90) has a frequency of 20. This suggests to us that the mode may
be some way towards the lower end of the modal group rather than at the centre. A graphical
method for estimating the mode is shown in Figure 5.2.
This method can be used when the distribution has equal class intervals. Draw that part of the
histogram which covers the modal class and the adjacent classes on either side.
Draw in the diagonals AB and CD as shown in Figure 5.2. From the point of intersection
draw a vertical line downwards. Where this line crosses the horizontal axis is the mode. In
our example the mode is just less than 71.
Page 156
Figure 5.2
Page 157
Characteristics of the Mode
Some of the characteristics of the mode are worth noting as you may well be asked to
compare them with those of the arithmetic mean.
a) The mode is very easy to find with ungrouped distributions, since no calculation is
b) It can only be determined roughly with grouped distributions.
c) It is not affected by the occurrence of extreme values.
d) Unlike the arithmetic mean, it is not based on all the items in the distribution, but only
on those near its value.
e) In ungrouped distributions the mode is an actually occurring value.
f) It is not amenable to the algebraic manipulation needed in advanced statistical work.
g) It is not unique, i.e. there can be more than one mode. For example, in the set of
numbers, 6, 7, 7, 7, 8, 8, 9, 10, 10, 10, 12, 13, there are two modes, namely 7 and 10.
This set of numbers would be referred to as having a bimodal distribution.
h) The mode may not exist. For example, in the set of numbers 7, 8, 10, 11, 12, each
number occurs only once so this distribution has no mode.
Page 158
Page 159
The desirable feature of any measure of location is that it should be near the middle of the
distribution to which it refers. Now, if a value is near the middle of the distribution, then we
expect about half of the distribution to have larger values, and the other half to have smaller
values. This suggests to us that a possible measure of location might be that value which is
such that exactly half (i.e. 50%) of the distribution has larger values and exactly half has
lower values. The value which so divides the distribution into equal parts is called the
MEDIAN. Look at the following set of values:
6, 7, 7, 8, 8, 9, 10, 10, 10, 12, 13
The total of these eleven numbers is 100 and the arithmetic mean is therefore 100/11 = 9.091,
while the mode is 10 because that is the number which occurs most often (three times). The
median, however, is 9 because there are five values above and five values below 9. Our first
rule for determining the median is therefore as follows:
Arrange all the values in order of magnitude and the median is then the middle value.
Note that all the values are to be used: even though some of them may be repeated, they must
all be put separately into the list. In the example just dealt with, it was easy to pick out the
middle value because there was an odd number of values. But what if there is an even
number? Then, by convention, the median is taken to be the arithmetic mean of the two
values in the middle. For example, take the following set of values:
6, 7, 7, 8, 8, 9, 10, 10, 11, 12
The two values in the middle are 8 and 9, so that the median is 8.5
Page 160
Median of a Simple Frequency Distribution
Statistical data, of course, is rarely in such small groups and, as you have already learned, we
usually deal with frequency distributions. How, then do we find the median if our data is in
the form of a distribution?
Let us take the example of the frequency distribution of accidents already used in discussing
the mode. The total number of values is 123 and so when those values are arranged in order
of magnitude, the median will be the 62nd item because that will be the middle item. To see
what the value of the 62nd item will be, let us again draw up the distribution:
You can see from the last column that, if we were to list all the separate values in order, the
first 27 would all be 0s and from then up to the 66th would be 1s; it follows therefore that the
62nd item would be a 1 and that the median of this distribution is 1.
Median of a Grouped Frequency Distribution
The final problem connected with the median is how to find it when our data is in the form of
a grouped distribution. The solution to the problem, as you might expect, is very similar to
the solution for an ungrouped distribution; we halve the total frequency and then find, from
the cumulative frequency column, the corresponding value of the variate.
Table 5.15
Page 161
Because a grouped frequency distribution nearly always has a large total frequency, and
because we do not know the exact values of the items in each group, it is not necessary to
find the two middle items when the total frequency is even: just halve the total frequency and
use the answer (whether it is a whole number or not) for the subsequent calculation.
The total frequency is 206 and therefore the median is the 103rd item which, from the
cumulative frequency column, must lie in the 60-70 group. But exactly where in the 60-70
group? Well, there are 92 items before we get to that group and we need the 103rd item, so
we obviously need to move into that group by 11 items. Altogether in our 60-70 group there
are 38 items so we need to move 11/38 of the way into that group, that is 11/38 of 10 above
60. Our median is therefore
60 + 110/38 = 60 + 2.89 = 62.89.
The use of the cumulative frequency distribution will, no doubt, remind you of its graphical
representation, the ogive. In practice, a convenient way to find the median of a grouped
distribution is to draw the ogive and then, against a cumulative frequency of half the total
frequency, to read off the median. In our example the median would be read against 103 on
the cumulative frequency scale (see Figure 5.3). If the ogive is drawn with relative
frequencies, then the median is always read off against 50%.
Table 5.16
0 ‹ 10
10 ‹ 20
20 ‹ 30
30 ‹ 40
40 ‹ 50
50 ‹ 60
60 ‹ 70
70 ‹ 80
80 ‹ 90
90 ‹ 100
100 ‹ 110
110 ‹ 120
Page 162
Figure 5.3
Page 163
Characteristics of the Median
Characteristic features of the median, which you should compare with those of the mean and
the mode, are as follows:
a) It is fairly easily obtained in most cases, and is readily understood as being the "half-
way point".
b) It is less affected by extreme values than the mean. The millionaire in the country
village might alter considerably the mean income of the village but he would have
almost no effect at all on the median.
c) It can be obtained without actually having all the values. If, for example, we want to
know the median height of a group of 21 men, we do not have to measure the height
of every single one; it is only necessary to stand the men in order of their heights and
then only the middle one (No. 11) need be measured, for his height will be the median
height. The median is thus of value when we have open-ended classes at the edges of
the distribution as its calculation does not depend on the precise values of the variate
in these classes, whereas the value of the arithmetic mean does.
d) The median is not very amenable to further algebraic manipulation.
Page 164
Page 165
Measures of Dispersion
Introduction to Dispersion
The Range
The Quartile Deviation, Deciles and Percentiles
The Quartile Deviation
Calculation of the Quartile Deviation
Deciles and Percentiles
The Standard Deviation
The Variance
Standard Deviation of a Simple Frequency Distribution
Standard Deviation of a Grouped Frequency Distribution
Characteristics of the Standard Deviation
The Coefficient of Variation
Averages & Measures of Dispersion
Measures of Central Tendency and Dispersion
The mean and Standard Deviation
The Standard Deviation
The Median and the Quartiles
The Mode
Dispersion and Skewness
Page 166
Page 167
In order to get an idea of the general level of values in a frequency distribution, we have
studied the various measures of location that are available. However, the figures
which go to make up a distribution may all be very close to the central value, or they may
be widely dispersed about it, e.g. the mean of 49 and 51 is 50, but the mean of 0 and 100 is
also 50! You can see, therefore, that two distributions may have the same mean but the
individual values may be spread about the mean in vastly different ways.
When applying statistical methods to practical problems, a knowledge of this spread
(which we call "dispersion" or "variation") is of great importance. Examine the figures in
the following table:
Although the two factories have the same mean output, they are very different in their
week-to-week consistency. Factory A achieves its mean production with only very little
variation from week to week, whereas Factory B achieves the same mean by erratic ups-
and-downs from week to week. This example shows that a mean (or other measure of
location) does not, by itself, tell the whole story and we therefore need to supplement it with
a "measure of dispersion".
Table 6.1
Page 168
As was the case with measures of location, there are several different measures of dispersion
in use by statisticians. Each has its own particular merits and demerits, which will be
discussed later. The measures in common use are:
Quartile deviation
Mean deviation
Standard deviation
We will discuss three of these here.
Page 169
This is the simplest measure of dispersion; it is simply the difference between the
largest and the smallest. In the example just given, we can see that the lowest weekly
output for Factory A was 90 and the highest was 107; the range is therefore 17. For Factory
B the range is 156 – 36 = 120. The larger range for Factory B shows that it performs less
consistently than Factory A.
The advantage of the range as a measure of the dispersion of a distribution is that it is very
easy to calculate and its meaning is easy to understand. For these reasons it is used a great
deal in industrial quality control work. Its disadvantage is that it is based on only two of the
individual values and takes no account of all those in between. As a result, one or
two extreme results can make it quite unrepresentative. Consequently, the range is not
much used except in the case just mentioned.
Page 170
Page 171
The Quartile Deviation
This measure of dispersion is sometimes called the "semi-interquartile range". To understand
it, you must cast your mind back to the method of obtaining the median from the ogive. The
median, you remember, is the value which divides the total frequency into two halves. The
values which divide the total frequency into quarters are called quartiles and they can also
be found from the ogive, as shown in Figure 6.1.
Figure 6.1
Page 172
This is the same ogive that we drew earlier when finding the median of the grouped
frequency distribution featured in Section D of the previous study unit.
You will notice that we have added the relative cumulative frequency scale to the right of
the graph. 100% corresponds to 206, i.e. the total frequency. It is then easy to read off the
values of the variate corresponding to 25%, 50% and 75% of the cumulative frequency,
giving the lower quartile (Q1), the median and the upper quartile (Q3) respectively.
Q1 = 46.5
Median = 63 (as found previously)
Q3 = 76
The difference between the two quartiles is the interquartile range and half of the
difference is the semi-interquartile range or quartile deviation:
Alternatively, you can work out 25% of the total frequency, i.e. 206 = 51.5and 75% of
the total frequency, i.e. 154.5, and read from the ogive the values of the variate
corresponding to 51.5 and 154.5 on the cumulative frequency scale (i.e. the left-hand
scale). The end result is the same.
Page 173
Calculation of the Quartile Deviation
The quartile deviation is not difficult to calculate and some examination questions may
specifically ask for it to be calculated, in which case a graphical method is not acceptable.
Graphical methods are never quite as accurate as calculations.
We shall again use the same example.
The table of values is reproduced for convenience:
We can make the calculations in exactly the same manner as we used for calculating the
median - we saw this in Section D of the previous study unit.
Table 6.2
0 ‹ 10
10 ‹ 20
20 ‹ 30
30 ‹ 40
40 ‹ 50
50 ‹ 60
60 ‹ 70
70 ‹ 80
80 ‹ 90
90 ‹ 100
100 ‹ 110
110 ‹ 120
Page 174
Looking at Table 6.2, the 51½th item comes in the 40-50 group and will be the (51½ – 36) =
15½th item within it.
Similarly, the upper quartile will be the 154th item which is in the 70-80 group and is the
(154 – 130) = 24th item within it.
Remember that the units of the quartiles and of the median are the same as those of the
The quartile deviation is unaffected by an occasional extreme value. It is not based,
however, on the actual value of all the items in the distribution and to this extent it is less
representative than the standard deviation. In general, when a median is the appropriate
measure of location then the quartile deviation should be used as the measure of
Page 175
Deciles and Percentiles
It is sometimes convenient, particularly when dealing with wages and employment
statistics, to consider values similar to the quartiles but which divide the distribution more
finely. Such partition values are deciles and percentiles. From their names you will
probably have guessed that the deciles are the values which divide the total frequency into
tenths and the percentiles are the values which divide the total frequency into hundredths.
Obviously it is only meaningful to consider such values when we have a large total
The deciles are labelled D1, D2 ... D9: the second decile D2, for example, is the value below
which 20% of the data lies and the sixth decile D6 is the value below which 60% of the data
The percentiles are labelled P1, P2 ... P99 and, for example, P5 is the value below which 5%
of the data lies and P64 is the value below which 64% of the data lies.
Using the same example as above, let us calculate, as an illustration, the third decile D3.
The method follows exactly the same principles as the calculation of the median and
so we are looking for the value of the 61.8th item. A glance at the cumulative frequency
column shows that the 61.8th item lies in the 50-60 group, and is the (61.8 – 60) = 1.8th
item within it.
Therefore 30% of our data lies below 50.6.
Page 176
We could also have found this result graphically; again check that you agree with the
calculation by reading D3 from the graph. You will see that the calculation method enables
us to give a more precise answer than is obtainable graphically.
Page 177
Most important of the measures of dispersion is the standard deviation. Except for the use
of the range in statistical quality control and the use of the quartile deviation in wages
statistics, the standard deviation is used almost exclusively in statistical practice. It is
defined as the square root of the variance and so we need to know how to calculate the
variance first.
The Variance
We start by finding the deviations from the mean, and then squaring them, which
removes the negative signs in a mathematically acceptable fashion, thus:
Table 6.3
Page 178
Standard Deviation of a Simple Frequency Distribution
If the data had been given as a frequency distribution (as is often the case) then only the
different values would appear in the "x" column and we would have to remember to
multiply each result by its frequency:
Standard Deviation of a Grouped Frequency Distribution
When we come to the problem of finding the standard deviation of a grouped frequency
distribution, we again assume that all the readings in a given group fall at the mid-point of
the group, so we can find the arithmetic mean as before. Let us use the following
distribution, with the mean deviation,
x = 41.7.
Table 6.4
Page 179
SD =
= 15.13
The arithmetic is rather tedious even with an electronic calculator, but we can extend the
"short-cut" method which we used for finding the arithmetic mean of a distribution, to
find the standard deviation as well. In that method we:
Worked from an assumed mean.
Worked in class intervals.
Applied a correction to the assumed mean.
Table 6.5
10 ‹ 20
20 ‹ 30
30 ‹ 40
40 ‹ 50
50 ‹ 60
60 ‹ 70
70 ‹ 80
Page 180
Table 6.6 shows you how to work out the standard deviation.
The standard deviation is calculated in four steps from this table, as follows:
Table 6.6
Page 181
This may seem a little complicated, but if you work through the example a few times, it will
all fall into place. Remember the following points:
a) Work from an assumed mean at the mid-point of any convenient class.
b) The correction is always subtracted from the approximate variance.
c) As you are working in class intervals, it is necessary to multiply by the class interval
as the last step.
d) The correction factor is the same as that used for the "short-cut" calculation of the
mean, but for the SD it has to be squared.
e) The column for d2 may be omitted since fd2 = fd multiplied by d. But do not omit
it until you have really grasped the principles involved.
g) The assumed mean should be chosen from a group with the most common interval
and c will be that interval. If the intervals vary too much, we revert to the basic
Characteristics of the Standard Deviation
In spite of the apparently complicated method of calculation, the standard deviation is the
measure of dispersion used in all but the very simplest of statistical studies. It is based on
all of the individual items, it gives slightly more emphasis to the larger deviations but does
not ignore the smaller ones and, most important, it can be treated mathematically in more
advanced statistics.
Page 182
Page 183
Suppose that we are comparing the profits earned by two businesses. One of them may
be a fairly large business with average monthly profits of RWF50,000, while the other
may be a small firm with average monthly profits of only RWF2,000. Clearly, the general
level of profits is very different in the two cases, but what about the month-by-month
variability? We will compare the two firms as to their variability by calculating the two
standard deviations; let us suppose that they both come to RWF500. Now, RWF500 is a
much more significant amount in relation to the small firm than it is in relation to the large
firm so that, although they have the same standard deviations, it would be unrealistic to say
that the two businesses are equally consistent in their month-to-month earnings of
profits. To overcome the difficulty, we express the SD as a percentage of the mean in each
case and we call the result the "coefficient of variation".
Applying the idea to the figures which we have just quoted, we get coefficients of variation
(usually indicated in formulae by V or CV) as follows:
This shows that, relatively speaking, the small firm is more erratic in its earnings than the
large firm.
Note that although a standard deviation has the same units as the variate, the coefficient of
variation is a ratio and thus has no units.
Another application of the coefficient of variation comes when we try to compare
distributions the data of which are in different units as, for example, when we try to
compare a French business with an American business. To avoid the trouble of converting
the dollars to euro (or vice versa) we can calculate the coefficients of variation in each case
and thus obtain comparable measures of dispersion.
Page 184
Page 185
When the items in a distribution are dispersed equally on each side of the mean, we say
that the distribution is symmetrical. Figure 6.2 shows two symmetrical distributions.
When the items are not symmetrically dispersed on each side of the mean, we say that the
distribution is skew or asymmetric.
A distribution which has a tail drawn out to the right is said to be positively skew, while one
with a tail to the left, is negatively skew. Two distributions may have the same mean and the
same standard deviation but they may be differently skewed. This will be obvious if you
look at one of the skew distributions in Figure 6.3 and then look at the same one through
from the other side of the paper!
Figure 6.2
Figure 6.3
Page 186
What, then, does skewness tell us? It tells us that we are to expect a few unusually high
values in a positively skew distribution or a few unusually low values in a negatively skew
If a distribution is symmetrical, the mean, mode and median all occur at the same point, i.e.
right in the middle. But in a skew distribution the mean and the median lie somewhere along
the side of the "tail", although the mode is still at the point where the curve is highest.
The more skewed the distribution, the greater the distance from the mode to the mean
and the median, but these two are always in the same order; working outwards from the
mode, the median comes first and then the mean - see Figure 6.4.
For most distributions, except for those with very long tails, the following
relationship holds approximately:
Mean Mode = 3(Mean Median)
Figure 6.4
Page 187
The more skew the distribution, the more spread out are these three measures of location,
and so we can use the amount of this spread to measure the amount of skewness. The most
usual way of doing this is to calculate:
You are expected to use one of these formulae when an examiner asks for the
skewness (or "coefficient of skewness", as some of them call it) of a distribution. When
you do the calculation, remember to get the correct sign (+ or –) when subtracting the mode
or median from the mean and then you will get negative answers from negatively skew
distributions, and positive answers for positively skew distributions. The value of the
coefficient of skewness is between –3 and +3, although values below –1 and above +1 are
rare and indicate very skewed distributions.
Examples of variates with positive skew distributions include size of incomes of a large
group of workers, size of households, length of service in an organisation, and age of a
workforce. Negative skew distributions occur less frequently. One such example is the age at
death for the adult population in Rwanda.
Page 188
Page 189
The Arithmetic Mean (usually called the
The Median
The Mode.
Measures of Central Tendency and Dispersion
Averages and variations for ungrouped and grouped data.
Special cases such as the Harmonic mean and the geometric mean
In the last section we described data using graphs, histograms and Ogives mainly for grouped
numerical data. Sometimes we do not want a graph; we want one figure to describe the data.
One such figure is called the average. There are three different averages, all summarise the
data with just one figure but each one has a different interpretation.
When describing data the most obvious way and the most common way is to get an average
figure. If I said the average amount of alcohol consumed by Rwandan women is 2.6 units per
week then how useful is this information? Usually averages on their own are not much use;
you also need a measure of how spread out the data is. We will deal with the spread of the
data later.
If you take the following 11 results. Each of the figures represents a student’s results.
= 10, 55, 65, 30, 89, 5, 87, 60, 55, 37, 35.
Figure 6.5
Page 190
What is the average mark?
From the above example concerning student’s results, the mean figure is less than the median
figure so if you wished to give the impression to your boss that the results were good you
would use the median as the average rather than the mean. In business therefore when quoted
an average number you need to be aware which one is being used.
The range = largest number – smallest number = 89 – 5 = 84 gives an idea of how spread out
the data is. This is a useful figure if trying to analyse what the mean is saying. In this case it
would show that the spread of results was very wide and that perhaps it might be better to
divide the class or put on extra classes in future. Remember that the statistics only give you
the information; it is up to you to interpret them. Usually in order to interpret them correctly
you need to delve into the data more and maybe do some further qualitative research.
A random sample of 5 weeks showed that a cruise agency received
the following number of weekly specials to the Caribbean:
20 73 75 80 82
(a) Compute the mean, median and mode
(b) Which measure of central tendency best describes the data?
Page 191
What is the best average, if any, to use in each of the following situations? Justify each
of your answers.
(a) To establish a typical wage to be used by an employer in wage negotiations for a small
company of 300 employees, a few of whom are very highly paid specialists.
(b) To determine the height to construct a bridge (not a draw bridge) where the distribution of
the heights of all ships which would pass under is known and is skewed to the right.
There are THREE different measures of AVEARGE, and three different measures of
dispersion. Once you know the mean and the standard deviation you can tell much more
about the data than if you have the average only.
Figure 6.6
Page 192
with the
The Mean and Standard Deviation.
This is very important.
The mean of grouped data is more complex than for raw data because you do not have the
raw figures in front of you, they have already been grouped. To find the mean therefore you
need to find the midpoint of each group and then apply the following formula:
Mean =
where x represents the midpoint of each class and f represents the frequency
of that class. Note that if you are given an open ended class then you must decide yourself
what that the mid- point is. The midpoint between 5 <10 = 7.5. The midpoint of a class <10
you could say is 5 or 8 or whatever you want below 10, it depends on what you decide is the
lower bound of the class.
If you need to get the midpoint of a class 36<56 the easiest way is to add 36+56 and divide
by 2 = 46.
Like all maths you just need to understand one example and then all the others follow the
same pattern. You do need to understand what you are doing though because in your exam
Figure 6.7
Page 193
you may get a question which has a slight trick and you need to be confident enough to figure
out the approach necessary to continue.
Using the example we had in the last section on statistics grades, we will now work out the
average grade.
Results f X Mid point fx
0 but less than 20
20 but less than 30
30 but less than 40
40 but less than 50 14 45 630
50 but less than 60
60 but less than 70
70 but less than 80
80 but less than 90
90 but less than 100
Total 80 3950
The mean score from the grouped data is given by the letter
3950 ===
The Do-It-Better Manufacturing Company operates a shift loading system whereby 60
employees work a range of hours depending on company demands. The following data was
Hours worked No. of employees
16 < 20 1
20 < 24 2
24 < 28 3
28 < 32 11
32 < 36 14
36 < 40 12
40 < 44 9
44 < 48 5
48 < 52 3
Table 6.6
Table 6.7
Page 194
The Standard Deviation
The next thing to estimate is the standard deviation. This is one figure which gives an
indication of how spread out the data is. In the above example the number of hours worked is
between 16 and 52 which is not that spread out so the standard deviate should be about 7( a
rule of thumb is that 3 standard deviations should bring you from the mean to the highest or
lowest figure in the data set). The mean here is 36, so if we take 36-16 = 20 and divide by 3
we get 7 approx., or we could take 52-36 =16 /3 =5.3. So we take the bigger figure. However
this is just a simple estimate and not sufficient for your exam. For your exam you need to
apply the formula so you need to be able to work through it.
S.D =
( )X X
S.D =
f X X
( )
We will work through an example for finding the standard deviation for raw data first:
Find the standard deviation of the following 5 numbers:
X= 10, 20, 30, 40, 50.
The mean is 30.
Using the table below: The standard deviation equals:
Mid pt Mean Deviations
10 30 -20 400
20 30 -10 100
30 30 0 0
40 30 10 100
50 30 20 400
Raw data
)( XX
Page 195
To work out the standard deviation for the grouped data using the example of the statistics
score we use the formula for the grouped data which is nearly the same as for the raw data
except you need to take into account the frequency with which each group score occurs.
To work out the standard deviation you continue using the same table as before. Look at the
headings on each column. It follows the formula. You need to practice this.
So the standard deviation for the statistics scores is: 57.22
75.40768 =
Table 6.8
Page 196
The Median and the Quartiles.
The median is the figure where half the values of the data set lie below this figure & half
above. In a class of students the median age would be the age of the person where half the
class is younger than this person and half older. It is the age of the middle aged student.
If you had a class of 11 students, to find the median age, you would line up all the students
starting with the youngest to the oldest. You would then count up to the middle person, the 5th
one along, ask them their age and that is the median age.
∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆ ∆
To find the median of raw data you need to firstly rank the figures from smallest to highest
and then choose the middle figure.
For grouped data it is not as easy to rank the data because you don’t have single figures you
have groups. There is a formula which can be used or the median can be found from the
ogive. From the ogive, you go to the half way point on the vertical axis (if this is already in
percentages then up to 50%) and then read the median off the horizontal axis.
If we use the data from the example of the statistics results we used before, you will
remember we drew the ogive from the following data:
Less than
Table 6.9
Page 197
020 40 60 80 100 120
Percentage cumulative frequency.
Percentage cumulative
We can read the median off this and we can also read the quartiles. The median is read by
going up to 50% on the vertical axis and the reading the mark off the horizontal axis. In the
above example it is approximately 48 marks.
Using the formula we can also get the median: the formula is:
Median =
Figure 6.8
Page 198
To use the formula you take the data in its frequency distribution as follows
Median = 40+
= 40+ (
) 10
= 40+8.57
The quartiles can be found also from the ogive or using a similar formula to that above.
Quartile 1 measures the mark below which 25% of the class got (33) and quartile 3 represents
the mark below which 75% of the class got (68). These can be read off the ogive at the 25%
mark and the 75% mark.
The interquartile range is found using the formula
13 QQ
. This indicates the spread about the
median. The semi-interquartile range (which is similar to the standard deviation) is the
interquartile range divided by 2.
0 but less than 20
20 but less than 30
30 but less than 40
(Cumulative 28)
40 but less than 50
50 but less than 60
60 but less than 70
70 but less than 80
80 but less than 90
90 but less than 100
Table 6.10
Page 199
050 100 150
Percentage cumulative frequency.
Percentage cumulative
For data which is normally distributed the median should lie half way between the two
quartiles, if the data is skewed to the right then the median will be closed to quartile 1. Why?
Percentiles are found in the same way as quartiles, the 10% percentile would be found by
going up 10% of the vertical axis, etc.
The Mode
There is no measure of dispersion associated with the mode.
The mode is the most frequently occurring figure in a data set. There is often no mode
particularly with continuous data or there could be a few modes. For raw data you find the
mode by looking at the data as before, or by doing a tally.
For grouped data you can estimate the mode from a histogram by finding the class with the
highest frequency and then estimating.
Figure 6.9
Page 200
Formula for the mode:
Mode =
To calculate the mode:
1) Determine the modal class, the class with the highest frequency
2) Find
= difference between the largest frequency and the frequency immediately
preceding it.
3) Find
= difference between the largest frequency and the frequency immediately
following it.
C= modal class width.
Measures of dispersion- range, variance, standard deviation, co-efficient of
The range is explained earlier it is found crudely by taking the highest figure in the data set
and subtracting the lowest figure.
The variance is very similar to the standard deviation and measures the spread of the data. If I
had two different classes and the mean result in both classes was the same, but the variance
was higher in class B then results in class B were more spread out. The variance is found by
getting the standard deviation and squaring it.
The standard deviation is done already.
The co-efficient of variation is used to establish which of two sets of data is relatively more
For example, take two companies ABC and CBA. You are given the following information
about their share price and the standard deviation of share price over the past year.
Page 201
Standard deviation
Co efficient of
variation( CV)
CV =
Deviation Standard
So CBA shares are relatively less variable.
The Harmonic mean: The harmonic mean is used in particular circumstances namely when
data consists of a set of rates such as prices, speed or productivity.
The formula for this is:
Harmonic mean is: 𝑛
The Geometric mean: This is used to average proportional increases.
An example will illustrate the use of this and the application of the formula:
It is known that the price of a product has increased by 5%2% 11% and 15% in four
successive years.
The GM is:
367.1 xxx
= 1.081
Table 6.11
Page 202
Dispersion and Skewness:
The normal distribution is used frequently in statistics. It is not skewed and the mean, median
and the mode will all have the same value. So for normally distributed data it does not matter
which measure of average you use as they are all the same.
Data which is skewed looks like this:
Figure 6.10
Figure 6.11
Page 203
The Normal Distribution
The Normal Distribution
Calculations Using Tables of the Normal Distribution
Tables of the Normal Distribution
Using the Symmetry of the Normal Distribution
Further Probability Calculations
Page 204
Page 205
In Study Unit 4, Section E of this module, we considered various graphical ways of
representing a frequency distribution. We considered a frequency dot diagram, a bar chart, a
polygon and a frequency histogram. For a typical histogram, see Figure 7.1. You will
immediately get the impression from this diagram that the values in the centre are much more
likely to occur than those at either extreme.
Consider now a continuous variable in which you have been able to make a very large
number of observations. You could compile a frequency distribution and then draw a
frequency bar chart with a very large number of bars, or a histogram with a very large
number of narrow groups. Your diagrams might look something like those in Figure 7.2.
Figure 7.1
Figure 7.2
Page 206
If you now imagine that these diagrams relate to relative frequency distribution and that a
smooth curve is drawn through the tops of the bars or rectangles, you will arrive at the idea of
a frequency curve.
Most of the distributions which we get in practice can be thought of as approximations to
distributions which we would get if we could go on and get an infinite total frequency;
similarly, frequency bar charts and histograms are approximations to the frequency curves
which we would get if we had a sufficiently large total frequency. In this course, from now
onwards, when we wish to illustrate frequency distributions without giving actual figures, we
will do so by drawing the frequency curve, as in Figure 7.3.
Figure 7.3
Page 207
The "Normal" or "Gaussian" distribution is probably the most important distribution in the
whole of statistical theory. It was discovered in the early 18th century, because it seemed to
represent accurately the random variation shown by natural phenomena. For example:
heights of adult men from one race
weights of a species of animals
the distribution of IQ levels in children of a certain age
weights of items packaged by a particular packing machine
life expectancy of light bulbs
A typical shape is shown in Figure 7.4. You will see that it has a central peak (i.e. it is
unimodal) and that it is symmetrical about this centre.
The mean of this distribution is shown as m on the diagram, and is located at the centre. The
standard deviation, which is usually denoted by s, is also shown.
There are some interesting properties which these curves exhibit, which allow us to carry out
calculations on them. For distributions of this approximate shape, we find that 68% of the
observations are within ±1 standard deviation of the mean, and 95% are within ±2 standard
deviations of the mean. For the normal distribution, these figures are exact. See Figure 7.5.
Figure 7.4
Page 208
These figures can be expressed as probabilities. For example, if an observation x comes from
a normal distribution with mean m and standard deviation s, the probability that x is between
(m s) and (m + s) is:
P(m σ < x < m + σ) = 0.68
Also P(m 2σ < x < m + 2σ) = 0.95
Figure 7.5
Page 209
Tables of the Normal Distribution
Tables exist which allow you to calculate the probability of an observation being within any
range, not just (m s) to (m + s) and (m 2s) to (m + 2s). We show here a set of tables
giving the proportion of the area under various parts of the curve of a normal distribution.
Table 7.1
Page 210
The figure given in the tables is the proportion of the area in one tail of the distribution. The
area under a section of the curve represents the proportion of observations of that size. For
example, the shaded area shown in Figure 48 represents the chance of an observation being
greater than m + 2s. The vertical line which defines this area is at m + 2s. Looking up the
value 2 in the table gives:
P(x > m + 2σ) = 0.02275
which is just over 2%.
Similarly, P(x > m + 1σ) is found by looking up the value 1 in the tables. This gives:
P(x > m + 1σ) = 0.1587
which is nearly 16%.
You can extract any value from P(x > m) to P(x > m + 3σ) from the tables. This means that
you can find the area in the tail of the normal distribution wherever the vertical line is drawn
on the diagram.
Figure 7.6
Page 211
Using the Symmetry of the Normal Distribution
Negative distances from the mean are not shown in the tables. Since the distribution is
symmetrical, it is easy to calculate these.
P(x < m 5σ) = P(x > m + 5σ)
So P(x > m 5σ) = 1 P(x < m 5σ)
This is illustrated in Figure 7.7.
Figure 7.7
Page 212
Further Probability Calculations
It is possible to calculate the probability of an observation being in the shaded area shown in
Figure 7.8, using values from the tables. This represents the probability that x is between m
0.7σ and m + 1.5σ