SPSS Survival Manual 4th Edition A Step By Guide To Data Analysis Using The Program 2010

SPSS_Survival_Manual_4th_Edition_0335242391_ manual pdf -FilePursuit

User Manual: manual pdf -FilePursuit

Open the PDF directly: View PDF PDF.
Page Count: 359 [warning: Documents this large are best viewed by clicking the View PDF Link!]

For the SPSS Survival Manual website, go to www.allenandunwin.com/spss
This is what readers from around the world say about the SPSS Survival Manual:
‘Best book ever written. My ability to work the maze of statistics and my sanity has been SAVED by this
book.
Natasha Davison, Doctorate of Health Psychology, Deakin University, Australia
‘I just wanted to say how much I value Julie Pallants SPSS Survival Manual. It’s quite the best text on
SPSS I’ve encountered and I recommend it to anyone whos listening!’
Professor Carolyn Hicks, Health Sciences, Birmingham University, UK
‘This book was responsible for an A on our educational research project. This is the perfect book for
people who are baffl ed by statistical analysis, but still have to understand and accomplish it.
Becky, Houston, Texas, USA
‘Truly a survival manual. This was highly recommended to me and was well worth it. I had no diffi culty
following the steps as they were so well laid out and included screen shots. This book takes the majority
of the anxiety out of statistical analysis.
C. Wright, amazon.com
‘Having perceived myself as one who was not confi dent in anything statistical, I worked my way through
the book and with each turn of the page gained more and more confi dence until I was running off
analyses with (almost) glee. I now enjoy using SPSS and this book is the reason for that.
Dr Marina Harvey, Centre for Professional Development, Macquarie University, Australia
‘I have had several courses in advanced statistics, but unfortunately none of them went too “in depth
into SPSS. This book does just that, in a clear “how to format that gets right to the point and tells you
what you need to know.
John Ryan, Atlanta, Georgia, USA
‘This book really lives up to its name . . . I highly recommend this book to any MBA student carrying
out a dissertation project, or anyone who needs some basic help with using SPSS and data analysis
techniques.
Business student, UK
‘I must say how much I value SPSS Survival Manual. It is so clearly written and helpful. I fi nd myself
using it constantly and also ask any students doing a thesis or dissertation to obtain a copy.
Associate Professor Sheri Bauman, Department of Educational Psychology, University of Arizona, USA
‘This book is simple to understand, easy to read and very concise. Those who have a general fear or dislike
for statistics or statistics and computers should enjoy reading this book.
Lloyd G. Waller PhD, Jamaica
‘There are several SPSS manuals published and this one really does do what it says on the tin . . . Whether
you are a beginner doing your BSc or struggling with your PhD research (or beyond!), I wholeheartedly
recommend this book.
British Journal of Occupational Therapy, UK
‘I love SPSS Survival Manual . . . I can’t imagine teaching without it. After seeing my copy and hearing me
talk about it many of my other colleagues are also utilising it.
Wendy Close PhD, Psychology Department, Wisconsin Lutheran College, USA
. . . being an external student so much of the time is spent teaching myself. But this has been made easier
with your manual as I have found much of the content very easy to follow. I only wish I had discovered
it earlier.
Anthropology student, Australia
‘This book is a “must have introduction to SPSS. Brilliant and highly recommended.
Dr Joe, South Africa
‘The strength of this book lies in the explanations that accompany the descriptions of tests and I predict
great popularity for this text among teachers, lecturers and researchers.
Roger Watson, Journal of Advanced Nursing
‘This is the one. If you need to do statistics for a thesis, dissertation, course, etc. but aren’t quite sure
where to start or what to do, this is the book you have been looking for. I don’t know how I would’ve
completed my dissertation without this book. EXTREMELY helpful and easy to understand without
being “dumbed down”.
Thomas A. Delaney, Eugene, Oregon, USA
‘This book is the absolute bible for SPSS users and the book’s cover picture says it all—a true life saver.
Without this book I would not be graduating with a doctoral degree.
A. Preston, Hawaii
‘Pallant’s excellent book has all the ingredients to take interested students,including the statistically naive
and the algebraically challenged, to a new level of skill and understanding.
Geoffrey N. Molloy, Behaviour Change journal
‘I have four SPSS manuals and have found that this is the only manual that explains the issues clearly
and is easy to follow. SPSS is evil and anything that makes it less so is fabulous.
Helen Scott, Psychology Honours Student, University of Queensland, Australia
‘To any students who have found themselves faced with the horror of SPSS when they had signed up for
a degree in psychology—this is a god send.
Psychology student, Ireland
‘This is the best SPSS manual I’ve had. Its comprehensive and easy to follow. I really enjoy it.
Norshidah Mohamed, Kuala Lumpur, Malaysia
‘Julie Pallant saved my life with this book. OK, slight exaggeration but this book really is a life saver . . . If
the mere thought of statistics gives you a headache, then this book is for you.
Statistics student, UK
‘Simply the best book on introductory SPSS that exists. I know nothing about the author but having
bought this book in the middle of a statistics open assignment I can confi dently say that I love her and
want to marry her. There must be dozens of books that claim to be beginners’ guides to SPSS. This one
actually does what it says: totally brilliant.
J. Sutherland, amazon.co.uk
SPSS
SURVIVAL MANUAL
A step by step guide to
data analysis using SPSS
4th edition
Julie Pallant
This fourth edition fi rst published in 2011
Copyright © Julie Pallant 2002, 2005, 2007, 2011
All rights reserved. No part of this book may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, recording or by any information storage
and retrieval system, without prior permission in writing from the publisher. The Australian Copyright
Act 1968 (the Act) allows a maximum of one chapter or 10 per cent of this book, whichever is the
greater, to be photocopied by any educational institution for its educational purposes provided that
the educational institution (or body that administers it) has given a remuneration notice to Copyright
Agency Limited (CAL) under the Act.
Allen & Unwin
83 Alexander Street
Crows Nest NSW 2065
Australia
Phone: (61 2) 8425 0100
Fax: (61 2) 9906 2218
Email: info@allenandunwin.com
Web: www.allenandunwin.com
Cataloguing-in-Publication details are available
from the National Library of Australia
www.librariesaustralia.nla.gov.au
ISBN 978 1 74237 392 8
Set in 11/13.5 pt Minion by Midland Typesetters, Australia
Printed in China at Everbest Printing Co
10 9 8 7 6 5 4 3 2 1
Contents
Preface vii
Data fi les and website viii
Introduction and overview x
Part One Getting started 1
1 Designing a study 3
2 Preparing a codebook 11
3 Getting to know SPSS 14
Part Two Preparing the data fi le 25
4 Creating a data fi le and entering data 27
5 Screening and cleaning the data 43
Part Three Preliminary analyses 51
6 Descriptive statistics 53
7 Using graphs to describe and explore the data 66
8 Manipulating the data 83
9 Checking the reliability of a scale 97
10 Choosing the right statistic 102
Part Four Statistical techniques to explore relationships among variables 121
11 Correlation 128
12 Partial correlation 143
13 Multiple regression 148
14 Logistic regression 168
15 Factor analysis 181
Part Five Statistical techniques to compare groups 203
16 Non-parametric statistics 213
17 T-tests 239
vi Contents
18 One-way analysis of variance 249
19 Two-way between-groups ANOVA 265
20 Mixed between-within subjects analysis of variance 274
21 Multivariate analysis of variance 283
22 Analysis of covariance 297
Appendix: Details of data fi les 319
Recommended reading 334
References 337
Index 341
Preface
For many students, the thought of completing a statistics subject, or using statistics
in their research, is a major source of stress and frustration. The aim of the original
SPSS Survival Manual (published in 2000) was to provide a simple, step-by-step guide
to the process of data analysis using SPSS. Unlike other statistical titles it did not
focus on the mathematical underpinnings of the techniques, but rather on the appro-
priate use of SPSS as a tool. Since the publication of the three editions of the SPSS
Survival Manual, I have received many hundreds of emails from students who have
been grateful for the helping hand (or lifeline).
The same simple approach has been incorporated in this fourth edition. Since
the last edition, however, SPSS has undergone a number of changes—including
a brief period when it changed name. During 2009 version 18 of the program was
renamed PASW Statistics, which stands for Predictive Analytics Software. The name
was changed again in 2010 to IBM SPSS. To prevent confusion I have referred to
the program as SPSS throughout the book, but all the material applies to programs
labelled both PASW and IBM SPSS. All chapters in this edition have been updated to
suit version 18 of the package (although most of the material is also suitable for users
of earlier versions).
I have resisted urges from students, instructors and reviewers to add too many
extra topics, but instead have upgraded and expanded the existing material. This
book is not intended to cover all possible statistical procedures available in SPSS, or
to answer all questions researchers might have about statistics. Instead, it is designed
to get you started with your research and to help you gain confi dence in the use of the
program to analyse your data. There are many other excellent statistical texts avail-
able that you should refer to—suggestions are made throughout each chapter in the
book. Additional material is also available on the book’s website (details in the next
section).
vii
Data fi les and website
Throughout the book, you will see examples of research that are taken from a number
of data fi les included on the website that accompanies this book. This website is at:
www.allenandunwin.com/spss
From this site you can download the data fi les to your hard drive or memory stick
by following the instructions on screen. Then you should start SPSS and open the data
les. These fi les can be opened only in SPSS.
The survey4ED.sav data fi le is a ‘real’ data fi le, based on a research project that
was conducted by one of my graduate diploma classes. So that you can get a feel for
the research process from start to fi nish, I have also included in the Appendix a copy
of the questionnaire that was used to generate this data and the codebook used to
code the data. This will allow you to follow along with the analyses that are presented
in the book, and to experiment further using other variables.
The second data fi le (error4ED.sav) is the same fi le as the survey4ED.sav, but I
have deliberately added some errors to give you practice in Chapter 5 at screening and
cleaning your data fi le.
The third data fi le (experim4ED.sav) is a manufactured (fake) data fi le, constructed
and manipulated to illustrate the use of a number of techniques covered in Part Five
of the book (e.g. Paired Samples t-test, Repeated Measures ANOVA). This fi le also
includes additional variables that will allow you to practise the skills learnt through-
out the book. Just don’t get too excited about the results you obtain and attempt to
replicate them in your own research!
The fourth fi le used in the examples in the book is depress4ED.sav. This is used
in Chapter 16, on non-parametric techniques, to illustrate some techniques used in
health and medical research.
Two other data fi les have been included, giving you the opportunity to complete
some additional activities with data from different discipline areas. The sleep4ED.sav
le is a real data fi le from a study conducted to explore the prevalence and impact of
sleep problems on aspects of people’s lives. The staffsurvey4ED.sav le comes from a
staff satisfaction survey conducted for a large national educational institution.
viii
Data fi les and websites ix
See the Appendix for further details of these fi les (and associated materials). Apart
from the data fi les, the SPSS Survival Manual website also contains a number of useful
items for students and instructors, including:
guidelines for preparing a research report
• practice exercises
updates on changes to SPSS as new versions are released
useful links to other websites
• additional reading
an instructor’s guide.
Introduction and overview
This book is designed for students completing research design and statistics courses
and for those involved in planning and executing research of their own. Hopefully this
guide will give you the confi dence to tackle statistical analyses calmly and sensibly, or
at least without too much stress!
Many of the problems that students experience with statistical analysis are due to
anxiety and confusion from dealing with strange jargon, complex underlying theories
and too many choices. Unfortunately, most statistics courses and textbooks encourage
both of these sensations! In this book I try to translate statistics into a language that
can be more easily understood and digested.
The SPSS Survival Manual is presented in a structured format, setting out step
by step what you need to do to prepare and analyse your data. Think of your data as
the raw ingredients in a recipe. You can choose to cook your ‘ingredients in different
ways—a fi rst course, main course, dessert. Depending on what ingredients you have
available, different options may, or may not, be suitable. (There is no point planning
to make beef stroganoff if all you have is chicken.) Planning and preparation are an
important part of the process (both in cooking and in data analysis). Some things you
will need to consider are:
Do you have the correct ingredients in the right amounts?
What preparation is needed to get the ingredients ready to cook?
What type of cooking approach will you use (boil, bake, stir-fry)?
Do you have a picture in your mind of how the end result (e.g. chocolate cake) is
supposed to look?
How will you tell when it is cooked?
Once it is cooked, how should you serve it so that it looks appetising?
The same questions apply equally well to the process of analysing your data. You
must plan your experiment or survey so that it provides the information you need,
in the correct format. You must prepare your data fi le properly and enter your
data carefully. You should have a clear idea of your research questions and how
x
Introduction and overview xi
you might go about addressing them. You need to know what statistical techniques
are available, what sort of variables are suitable and what are not. You must be
able to perform your chosen statistical technique (e.g. t-test) correctly and interpret
the output. Finally, you need to relate this output’ back to your original research
question and know how to present this in your report (or in cooking terms, should
you serve your chocolate cake with cream or ice-cream, or perhaps some berries and
a sprinkle of icing sugar on top?).
In both cooking and data analysis, you can’t just throw in all your ingredients
together, shove it in the oven (or SPSS, as the case may be) and hope for the best.
Hopefully this book will help you understand the data analysis process a little better
and give you the confi dence and skills to be a better cook’.
STRUCTURE OF THIS BOOK
This SPSS Survival Manual consists of 22 chapters, covering the research process from
designing a study through to the analysis of the data and presentation of the results.
It is broken into fi ve main parts. Part One (Getting started) covers the preliminar-
ies: designing a study, preparing a codebook and becoming familiar with SPSS. In
Part Two (Preparing the data fi le) you will be shown how to prepare a data fi le, enter
your data and check for errors. Preliminary analyses are covered in Part Three, which
includes chapters on the use of descriptive statistics and graphs; the manipulation of
data; and the procedures for checking the reliability of scales. You will also be guided,
step by step, through the sometimes diffi cult task of choosing which statistical tech-
nique is suitable for your data.
In Part Four the major statistical techniques that can be used to explore relation-
ships are presented (e.g. correlation, partial correlation, multiple regression, logistic
regression and factor analysis). These chapters summarise the purpose of each tech-
nique, the underlying assumptions, how to obtain results, how to interpret the output,
and how to present these results in your thesis or report.
Part Five discusses the statistical techniques that can be used to compare groups.
These include non-parametric techniques, t-tests, analysis of variance, multivariate
analysis of variance and analysis of covariance.
USING THIS BOOK
To use this book effectively as a guide to SPSS, you need some basic computer skills.
In the instructions and examples provided throughout the text I assume that you are
already familiar with using a personal computer, particularly the Windows functions.
I have listed below some of the skills you will need. Seek help if you have diffi culty
with any of these operations. You will need to be able to:
xii Introduction and overview
use the Windows drop-down menus
use the left and right buttons on the mouse
use the click and drag technique for highlighting text
minimise and maximise windows
start and exit programs from the Start menu or from Windows Explorer
move between programs that are running simultaneously
open, save, rename, move and close fi les
work with more than one fi le at a time, and move between fi les that are open
use Windows Explorer to copy fi les from a memory stick to the hard drive, and
back again
use Windows Explorer to create folders and to move fi les between folders.
This book is not designed to ‘stand alone. It is assumed that you have been exposed to
the fundamentals of statistics and have access to a statistics text. It is important that
you understand some of what goes on ‘below the surface when using SPSS. SPSS is
an enormously powerful data analysis package that can handle very complex statis-
tical procedures. This manual does not attempt to cover all the different statistical
techniques available in the program. Only the most commonly used statistics are
covered. It is designed to get you started and to develop your confi dence in using the
program.
Depending on your research questions and your data, it may be necessary to tackle
some of the more complex analyses available in SPSS. There are many good books
available covering the various statistical techniques in more detail. Read as widely as
you can. Browse the shelves in your library, look for books that explain statistics in a
language that you understand (well, at least some of it anyway!). Collect this material
together to form a resource to be used throughout your statistics classes and your
research project. It is also useful to collect examples of journal articles where statisti-
cal analyses are explained and results are presented. You can use these as models for
your fi nal write-up.
The SPSS Survival Manual is suitable for use as both an in-class text, where you
have an instructor taking you through the various aspects of the research process,
and as a self-instruction book for those conducting an individual research project.
If you are teaching yourself, be sure to actually practise using SPSS by analysing the
data that is included on the website accompanying this book (see p. viii for details).
The best way to learn is by actually doing, rather than just reading. ‘Play’ with the data
les from which the examples in the book are taken before you start using your own
data fi le. This will improve your confi dence and also allow you to check that you are
performing the analyses correctly.
Sometimes you may fi nd that the output you obtain is different from that presented
in the book. This is likely to occur if you are using a different version of SPSS from that
Introduction and overview xiii
used throughout this book (SPSS Statistics 18). SPSS regularly updates its products,
which is great in terms of improving the program but it can lead to confusion for
students who fi nd that what is on the screen differs from what is in the book. Usually
the difference is not too dramatic, so stay calm and play detective. The information
may be there, but just in a different form. For information on changes to the SPSS
products you may like to go to the SPSS website (www.spss.com).
RESEARCH TIPS
If you are using this book to guide you through your own research project, there are a
few additional tips I would like to recommend.
Plan your project carefully. Draw on existing theories and research to guide the
design of your project. Know what you are trying to achieve and why.
Think ahead. Anticipate potential problems and hiccups—every project has them!
Know what statistics you intend to employ and use this information to guide the
formulation of data collection materials. Make sure that you will have the right
sort of data to use when you are ready to do your statistical analyses.
Get organised. Keep careful notes of all relevant research, references etc. Work out
an effective fi ling system for the mountain of journal articles you will acquire and,
later on, the output from SPSS. It is easy to become disorganised, overwhelmed
and confused.
Keep good records. When using SPSS to conduct your analyses, keep careful
records of what you do. I recommend to all my students that they buy a spiral-
bound exercise book to record every session they spend on SPSS. You should
record the date, new variables you create, all analyses you perform and the names
of the fi les where you have saved the output. If you have a problem or something
goes horribly wrong with your data fi le, this information can be used by your
supervisor to help rescue you!
Stay calm! If this is your fi rst exposure to SPSS and data analysis, there may be
times when you feel yourself becoming overwhelmed. Take some deep breaths
and use some positive self-talk. Just take things step by step—give yourself
permission to make mistakes and become confused sometimes. If it all gets too
much then stop, take a walk and clear your head before you tackle it again. Most
students fi nd SPSS quite easy to use, once they get the hang of it. Like learning
any new skill, you just need to get past that fi rst feeling of confusion and lack of
con dence.
Give yourself plenty of time. The research process, particularly the data entry
and data analysis stages, always takes longer than expected, so allow plenty of time
for this.
Work with a friend. Make use of other students for emotional and practical
support during the data analysis process. Social support is a great buffer against
stress!
ADDITIONAL RESOURCES
There are a number of different topic areas covered throughout this book, from
the initial design of a study, questionnaire construction, basic statistical techniques
(t-tests, correlation), through to advanced statistics (multivariate analysis of variance,
factor analysis). Further reading and resource material is recommended throughout
the different chapters in the book. You should try to read as broadly as you can, par-
ticularly if tackling some of the more complex statistical procedures.
xiv Introduction and overview
PART ONE
Getting started
Data analysis is only one part of the research process. Before you can use SPSS to
analyse your data, there are a number of things that need to happen. First, you have
to design your study and choose appropriate data collection instruments. Once you
have conducted your study, the information obtained must be prepared for entry into
SPSS (using something called a codebook’). To enter the data you must understand
how SPSS works and how to talk to it appropriately. Each of these steps is discussed
in Part One.
Chapter 1 provides some tips and suggestions for designing a study, with the aim
of obtaining good-quality data. Chapter 2 covers the preparation of a codebook to
translate the information obtained from your study into a format suitable for SPSS.
Chapter 3 takes you on a guided tour of the program, and some of the basic skills that
you will need are discussed. If this is your fi rst time using SPSS, it is important that
you read the material presented in Chapter 3 before attempting any of the analyses
presented later in the book.
1
This page intentionally left blank
3
1
Designing a study
Although it might seem a bit strange to discuss research design in a book on SPSS, it is
an essential part of the research process that has implications for the quality of the data
collected and analysed. The data you enter must come from somewhere—responses to
a questionnaire, information collected from interviews, coded observations of actual
behaviour, or objective measurements of output or performance. The data are only
as good as the instrument that you used to collect them and the research framework
that guided their collection.
In this chapter a number of aspects of the research process are discussed that have
an impact on the potential quality of the data. First, the overall design of the study
is considered; this is followed by a discussion of some of the issues to consider when
choosing scales and measures; and fi nally, some guidelines for preparing a question-
naire are presented.
PLANNING THE STUDY
Good research depends on the careful planning and execution of the study. There
are many excellent books written on the topic of research design to help you
with this process—from a review of the literature, formulation of hypotheses,
choice of study design, selection and allocation of participants, recording of obser-
vations and collection of data. Decisions made at each of these stages can affect the
quality of the data you have to analyse and the way you address your research ques-
tions. In designing your own study I would recommend that you take your time
working through the design process to make it the best study that you can produce.
Reading a variety of texts on the topic will help. A few good, easy-to-follow titles
are Stangor (2006), Goodwin (2007) and, if you are working in the area of market
research, Boyce (2003). A good basic overview for health and medical research is
Peat (2001).
4 Getting Started
To get you started, consider these tips when designing your study:
Consider what type of research design (e.g. experiment, survey, observation) is the
best way to address your research question. There are advantages and disadvan-
tages to all types of research approaches; choose the most appropriate approach
for your particular research question. Have a good understanding of the research
that has already been conducted in your topic area.
If you choose to use an experiment, decide whether a between-groups design
(different cases in each experimental condition) or a repeated measures design
(same cases tested under all conditions) is the more appropriate for your research
question. There are advantages and disadvantages to each approach (see Stangor
2006), so weigh up each approach carefully.
In experimental studies, make sure you include enough levels in your indepen-
dent variable. Using only two levels (or groups) means fewer participants are
required, but it limits the conclusions that you can draw. Is a control group necess-
ary or desirable? Will the lack of control group limit the conclusions that you
can draw?
Always select more participants than you need, particularly if you are using a sample
of humans. People are notoriously unreliable—they don’t turn up when they are
supposed to, they get sick, drop out and don’t fi ll out questionnaires properly! So
plan accordingly. Err on the side of pessimism rather than optimism.
In experimental studies, check that you have enough participants in each of
your groups (and try to keep them equal when possible). With small groups, it is
diffi cult to detect statistically signifi cant differences between groups (an issue of
power, discussed in the introduction to Part Five). There are calculations you can
perform to determine the sample size that you will need. See, for example, Stangor
(2006), or consult other statistical texts under the heading power’.
Wherever possible, randomly assign participants to each of your experimental
conditions, rather than using existing groups. This reduces the problem associated
with non-equivalent groups in between-groups designs. Also worth considering
is taking additional measurements of the groups to ensure that they don’t differ
substantially from one another. You may be able to statistically control for differ-
ences that you identify (e.g. using analysis of covariance).
Choose appropriate dependent variables that are valid and reliable (see discussion
on this point later in this chapter). It is a good idea to include a number of differ-
ent measures—some measures are more sensitive than others. Don’t put all your
eggs in one basket.
Try to anticipate the possible infl uence of extraneous or confounding variables.
These are variables that could provide an alternative explanation for your results.
Sometimes they are hard to spot when you are immersed in designing the study
Designing a study 5
yourself. Always have someone else (supervisor, fellow researcher) check over
your design before conducting the study. Do whatever you can to control for these
potential confounding variables. Knowing your topic area well can also help you
identify possible confounding variables. If there are additional variables that you
cannot control, can you measure them? By measuring them, you may be able to
control for them statistically (e.g. using analysis of covariance).
If you are distributing a survey, pilot-test it fi rst to ensure that the instructions,
questions and scale items are clear. Wherever possible, pilot-test on the same type
of people who will be used in the main study (e.g. adolescents, unemployed youth,
prison inmates). You need to ensure that your respondents can understand the
survey or questionnaire items and respond appropriately. Pilot-testing should
also pick up any questions or items that may offend potential respondents.
If you are conducting an experiment, it is a good idea to have a full dress rehearsal
and to pilot-test both the experimental manipulation and the dependent measures
you intend to use. If you are using equipment, make sure it works properly. If you
are using different experimenters or interviewers, make sure they are properly
trained and know what to do. If different observers are required to rate behaviours,
make sure they know how to appropriately code what they see. Have a practice run
and check for inter-rater reliability (i.e. how consistent scores are from different
raters). Pilot-testing of the procedures and measures helps you identify anything
that might go wrong on the day and any additional contaminating factors that
might infl uence the results. Some of these you may not be able to predict (e.g.
workers doing noisy construction work just outside the labs window), but try to
control those factors that you can.
CHOOSING APPROPRIATE SCALES AND MEASURES
There are many different ways of collecting data, depending on the nature of your
research. This might involve measuring output or performance on some objective
criteria, or rating behaviour according to a set of specifi ed criteria. It might also
involve the use of scales that have been designed to operationalise’ some underly-
ing construct or attribute that is not directly measurable (e.g. self-esteem). There are
many thousands of validated scales that can be used in research. Finding the right one
for your purpose is sometimes diffi cult. A thorough review of the literature in your
topic area is the fi rst place to start. What measures have been used by other research-
ers in the area? Sometimes the actual items that make up the scales are included in
the appendix to a journal article; otherwise you may need to trace back to the original
article describing the design and validation of the scale you are interested in. Some
scales have been copyrighted, meaning that to use them you need to purchase offi cial’
copies from the publisher. Other scales, which have been published in their entirety
6 Getting Started
in journal articles, are considered to be ‘in the public domain, meaning that they
can be used by researchers without charge. It is very important, however, to properly
acknowledge each of the scales you use, giving full reference details.
In choosing appropriate scales there are two characteristics that you need
to be aware of: reliability and validity. Both of these factors can infl uence the
quality of the data you obtain. When reviewing possible scales to use, you should
collect information on the reliability and validity of each of the scales. You will
need this information for the ‘Method’ section of your research report. No matter
how good the reports are concerning the reliability and validity of your scales, it
is important to pilot-test them with your intended sample. Sometimes scales are
reliable with some groups (e.g. adults with an English-speaking background), but
are totally unreliable when used with other groups (e.g. children from non-English-
speaking backgrounds).
Reliability
The reliability of a scale indicates how free it is from random error. Two frequently
used indicators of a scale’s reliability are test-retest reliability (also referred to as
‘temporal stability’) and internal consistency. The test-retest reliability of a scale
is assessed by administering it to the same people on two different occasions, and
calculating the correlation between the two scores obtained. High test-retest corre-
lations indicate a more reliable scale. You need to take into account the nature of the
construct that the scale is measuring when considering this type of reliability. A scale
designed to measure current mood states is not likely to remain stable over a period
of a few weeks. The test-retest reliability of a mood scale, therefore, is likely to be low.
You would, however, hope that measures of stable personality characteristics would
stay much the same, showing quite high test-retest correlations.
The second aspect of reliability that can be assessed is internal consistency. This
is the degree to which the items that make up the scale are all measuring the same
underlying attribute (i.e. the extent to which the items hang together’). Internal
consistency can be measured in a number of ways. The most commonly used statistic
is Cronbachs coeffi cient alpha (available using SPSS, see Chapter 9). This statistic
provides an indication of the average correlation among all of the items that make up
the scale. Values range from 0 to 1, with higher values indicating greater reliability.
While different levels of reliability are required, depending on the nature and
purpose of the scale, Nunnally (1978) recommends a minimum level of .7. Cronbach
alpha values are dependent on the number of items in the scale. When there are a
small number of items in the scale (fewer than 10), Cronbach alpha values can be
quite small. In this situation it may be better to calculate and report the mean inter-
item correlation for the items. Optimal mean inter-item correlation values range from
.2 to .4 (as recommended by Briggs & Cheek 1986).
Designing a study 7
Validity
The validity of a scale refers to the degree to which it measures what it is supposed to
measure. Unfortunately, there is no one clear-cut indicator of a scale’s validity. The
validation of a scale involves the collection of empirical evidence concerning its use.
The main types of validity you will see discussed are content validity, criterion validity
and construct validity.
Content validity refers to the adequacy with which a measure or scale has sampled
from the intended universe or domain of content. Criterion validity concerns the
relationship between scale scores and some specifi ed, measurable criterion. Construct
validity involves testing a scale not against a single criterion but in terms of theoretically
derived hypotheses concerning the nature of the underlying variable or construct. The
construct validity is explored by investigating its relationship with other constructs,
both related (convergent validity) and unrelated (discriminant validity). An easy-to-
follow summary of the various types of validity is provided in Stangor (2006) and in
Streiner and Norman (2008).
If you intend to use scales in your research, it would be a good idea to read further
on this topic: see Kline (2005) for information on psychological tests, and Streiner and
Norman (2008) for health measurement scales. Bowling also has some great books on
health and medical scales.
PREPARING A QUESTIONNAIRE
In many studies it is necessary to collect information from your participants or respon-
dents. This may involve obtaining demographic information from participants prior
to exposing them to some experimental manipulation. Alternatively, it may involve the
design of an extensive survey to be distributed to a selected sample of the population. A
poorly planned and designed questionnaire will not give good data with which to address
your research questions. In preparing a questionnaire, you must consider how you intend
to use the information; you must know what statistics you intend to use. Depending on
the statistical technique you have in mind, you may need to ask the question in a particular
way, or provide different response formats. Some of the factors you need to consider in the
design and construction of a questionnaire are outlined in the sections that follow.
This section only briefl y skims the surface of questionnaire design, so I would
suggest that you read further on the topic if you are designing your own study. A really
great book for this purpose is De Vaus (2002) or, if your research area is business,
Boyce (2003).
Question types
Most questions can be classifi ed into two groups: closed or open-ended. A closed
question involves offering respondents a number of defi ned response choices. They are
8 Getting Started
asked to mark their response using a tick, cross, circle, etc. The choices may be a simple
Yes/No, Male/Female, or may involve a range of different choices. For example:
What is the highest level of education you have completed (please tick)?
1. Primary school
2. Some secondary school
3. Completed secondary school
4. Trade training
5. Undergraduate university
6. Postgraduate university
Closed questions are usually quite easy to convert to the numerical format required
for SPSS. For example, Yes can be coded as a 1, No can be coded as a 2; Males as 1,
Females as 2. In the education question shown above, the number corresponding to
the response ticked by the respondent would be entered. For example, if the respon-
dent ticked Undergraduate university, this would be coded as a 5. Numbering each of
the possible responses helps with the coding process. For data entry purposes, decide
on a convention for the numbering (e.g. in order across the page, and then down),
and stick with it throughout the questionnaire.
Sometimes you cannot guess all the possible responses that respondents might
make—it is therefore necessary to use open-ended questions. The advantage here is
that respondents have the freedom to respond in their own way, not restricted to the
choices provided by the researcher. For example:
What is the major source of stress in your life at the moment?
___________________________________________________________________
___________________________________________________________________
Responses to open-ended questions can be summarised into a number of different
categories for entry into SPSS. These categories are usually identifi ed after looking
through the range of responses actually received from the respondents. Some possi-
bilities could also be raised from an understanding of previous research in the area.
Each of these response categories is assigned a number (e.g. work=1, nances=2, rela-
tionships=3), and this number is entered into SPSS. More details on this are provided
in the section on preparing a codebook in Chapter 2.
Sometimes a combination of both closed and open-ended questions works best.
This involves providing respondents with a number of defi ned responses, and also
an additional category (other) that they can tick if the response they wish to give is
not listed. A line or two is provided so that they can write the response they wish to
Designing a study 9
give. This combination of closed and open-ended questions is particularly useful in
the early stages of research in an area, as it gives an indication of whether the defi ned
response categories adequately cover all the responses that respondents wish to give.
Response format
In asking respondents a question, you also need to decide on a response format. The
type of response format you choose can have implications when you come to do your
statistical analysis. Some analyses (e.g. correlation) require scores that are continuous,
from low through to high, with a wide range of scores. If you had asked respondents
to indicate their age by giving them a category to tick (e.g. less than 30, between 31
and 50 and over 50), these data would not be suitable to use in a correlational analysis.
So, if you intend to explore the correlation between age and, say, self-esteem, you
will need to ensure that you ask respondents for their actual age in years. Be warned
though, some people dont like giving their exact age (e.g. women over 30!).
Try to provide as wide a choice of responses to your questions as possible. You can
always condense things later if you need to (see Chapter 8). Don’t just ask respondents
whether they agree or disagree with a statement—use a Likert-type scale, which can
range from strongly disagree to strongly agree:
strongly disagree 1 2 3 4 5 6 strongly agree
This type of response scale gives you a wider range of possible scores, and increases the
statistical analyses that are available to you. You will need to make a decision concern-
ing the number of response steps (e.g. 1 to 6) that you use. DeVellis (2003) has a good
discussion concerning the advantages and disadvantages of different response scales.
Whatever type of response format you choose, you must provide clear instructions. Do
you want your respondents to tick a box, circle a number, make a mark on a line? For
some respondents, this may be the fi rst questionnaire that they have completed. Don’t
assume they know how to respond appropriately. Give clear instructions, provide an
example if appropriate, and always pilot-test on the type of people that will make up
your sample. Iron out any sources of confusion before distributing hundreds of your
questionnaires. In designing your questions, always consider how a respondent might
interpret the question and all the possible responses a person might want to make.
For example, you may want to know whether people smoke or not. You might ask the
question:
Do you smoke? (please tick) Yes No
In trialling this questionnaire, your respondent might ask whether you mean ciga-
rettes, cigars or marijuana. Is knowing whether they smoke enough? Should you also
10 Getting Started
nd out how much they smoke (two or three cigarettes, versus two or three packs),
and/or how often they smoke (every day or only on social occasions)? The message
here is to consider each of your questions, what information they will give you and
what information might be missing.
Wording the questions
There is a real art to designing clear, well-written questionnaire items. Although there
are no clear-cut rules that can guide this process, there are some things you can do to
improve the quality of your questions, and therefore your data. Try to avoid:
long complex questions
• double negatives
• double-barrelled questions
jargon or abbreviations
• culture-specifi c terms
words with double meanings
• leading questions
emotionally loaded words.
When appropriate, you should consider including a response category for ‘Don’t
know’ or ‘Not applicable’. For further suggestions on writing questions, see De Vaus
(2002) and Kline (2005).
11
2
Preparing a codebook
Before you can enter the information from your questionnaire, interviews or experi-
ment into SPSS, it is necessary to prepare a codebook. This is a summary of the
instructions you will use to convert the information obtained from each subject or
case into a format that SPSS can understand. The steps involved will be demonstrated
in this chapter using a data fi le that was developed by a group of my graduate diploma
students. A copy of the questionnaire, and the codebook that was developed for this
questionnaire, can be found in the Appendix. The data fi le is provided on the website
that accompanies this book. The provision of this material allows you to see the whole
process, from questionnaire development through to the creation of the fi nal data fi le
ready for analysis. Although I have used a questionnaire to illustrate the steps involved
in the development of a codebook, a similar process is also necessary in experimental
studies, or when retrieving information from existing records (e.g. hospital medical
records).
Preparing the codebook involves deciding (and documenting) how you will go
about:
• defi ning and labelling each of the variables
assigning numbers to each of the possible responses.
All this information should be recorded in a book or computer fi le. Keep this some-
where safe; there is nothing worse than coming back to a data fi le that you havent
used for a while and wondering what the abbreviations and numbers refer to.
In your codebook you should list all of the variables in your questionnaire, the
abbreviated variable names that you will use in SPSS and the way in which you will
code the responses. In this chapter simplifi ed examples are given to illustrate the various
steps. In the fi rst column of Table 2.1 you have the name of the variable (in English,
rather than in computer talk). In the second column you write the abbreviated name
12 Getting Started
for that variable that will appear in SPSS (see conventions below), and in the third
column you detail how you will code each of the responses obtained.
Variable names
Each question or item in your questionnaire must have a unique variable name. Some
of these names will clearly identify the information (e.g. sex, age). Other questions,
such as the items that make up a scale, may be identifi ed using an abbreviation (e.g.
op1, op2, op3 is used to identify the items that make up the Optimism Scale).
There are a number of conventions you must follow in assigning names to your
variables in SPSS. These are set out in the ‘Rules for naming of variables’ box. In
earlier versions of SPSS (prior to Version 12), you could use only eight characters
for your variable names. The later versions of the program allow you longer variable
names, but very long names can make the output rather hard to read so keep them as
concise as possible.
Rules for naming of variables
Variable names:
must be unique (i.e. each variable in a data set must have a different name)
must begin with a letter (not a number)
cannot include full stops, spaces or symbols (! , ? * “)
cannot include words used as commands by SPSS (all, ne, eq, to, le, lt, by,
or, gt, and, not, ge, with)
cannot exceed 64 characters.
Variable SPSS variable name Coding instructions
Identifi cation number ID Number assigned to each survey
Sex Sex 1 = Males
2 = Females
Age Age Age in years
Marital status Marital 1 = single
2 = steady relationship
3 = married for the fi rst time
4 = remarried
5 = divorced/separated
6 = widowed
Optimism Scale op1 to op6 Enter the number circled from
items 1 to 6 1 (strongly disagree) to
5 (strongly agree)
Table 2.1
Example of a
codebook
Preparing a codebook 13
The fi rst variable in any data set should be ID—that is, a unique number that
identifi es each case. Before beginning the data entry process, go through and assign a
number to each of the questionnaires or data records. Write the number clearly on the
front cover. Later, if you fi nd an error in the data set, having the questionnaires or data
records numbered allows you to check back and fi nd where the error occurred.
CODING RESPONSES
Each response must be assigned a numerical code before it can be entered into SPSS.
Some of the information will already be in this format (e.g. age in years); other vari-
ables such as sex will need to be converted to numbers (e.g. 1=males, 2=females). If
you have used numbers in your questions to label your responses (see, for example,
the education question in Chapter 1), this is relatively straightforward. If not, decide
on a convention and stick to it. For example, code the fi rst listed response as 1, the
second as 2 and so on across the page.
What is your current marital status? (please tick)
single in a relationship married divorced
To code responses to the question above: if a person ticked single, they would
be coded as 1; if in a relationship, they would be coded 2; if married, 3; and if
divorced, 4.
CODING OPEN-ENDED QUESTIONS
For open-ended questions (where respondents can provide their own answers), coding
is slightly more complicated. Take, for example, the question: What is the major source
of stress in your life at the moment? To code responses to this, you will need to scan
through the questionnaires and look for common themes. You might notice a lot of
respondents listing their source of stress as related to work, nances, relationships,
health or lack of time. In your codebook you list these major groups of responses under
the variable name stress, and assign a number to each (work=1, spouse/partner=2 and
so on). You also need to add another numerical code for responses that did not fall
into these listed categories (other=99). When entering the data for each respondent,
you compare his/her response with those listed in the codebook and enter the appro-
priate number into the data set under the variable stress.
Once you have drawn up your codebook, you are almost ready to enter your data.
First you need to get to know SPSS (Chapter 3), and then you need to set up a data fi le
and enter your data (Chapter 4).
14
3
Getting to know SPSS
SPSS operates using a number of different screens, or ‘windows’, designed to do differ-
ent things. Before you can access these windows, you need to either open an existing
data fi le or create one of your own. So, in this chapter we will cover how to open and
close SPSS; how to open and close existing data fi les; and how to create a data fi le from
scratch. We will then go on to look at the different windows SPSS uses.
STARTING SPSS
There are a number of different ways to start SPSS:
The simplest way is to look for an SPSS icon on your desktop. Place your cursor
on the icon and double-click.
You can also start SPSS by clicking on Start, move your cursor to All Programs,
and then across to the list of programs available. See if you have a folder labelled
SPSS Inc, which should contain the option SPSS Statistics 18. This may vary
depending on your computer and the SPSS licence that you have.
SPSS will also start up if you double-click on an SPSS data fi le listed in Windows
Explorer—these fi les have a .sav extension.
When you open SPSS, you may encounter a front cover screen asking ‘What would
you like to do?’ It is easier to close this screen (click on the cross in the top right-hand
corner) and to use the menus.
OPENING AN EXISTING DATA FILE
If you wish to open an existing data fi le (e.g. survey4ED.sav, one of the fi les included
on the website that accompanies this book—see p. viii), click on File from the menu
Getting to know SPSS 15
across the top of the screen, and then choose Open, and then slide across to Data. The
Open File dialogue box will allow you to search through the various directories on
your computer to fi nd where your data fi le is stored.
You should always open data fi les from the hard drive of your computer. If you
have data on a memory stick or fl ash drive, transfer it to a folder on the hard drive
of your computer before opening it. Find the fi le you wish to use and click on Open.
Remember, all SPSS data fi les have a .sav extension. The data fi le will open in front of
you in what is labelled the Data Editor window (more on this window later).
WORKING WITH DATA FILES
In SPSS, you are allowed to have more than one data fi le open at any one time. This
can be useful, but also potentially confusing. You must keep at least one data fi le open
at all times. If you close a data fi le, SPSS will ask if you would like to save the fi le
before closing. If you don’t save it, you will lose any data you may have entered and
any recoding or computing of new variables that you may have done since the fi le was
opened.
Saving a data fi le
When you fi rst create a data fi le or make changes to an existing one (e.g. creating new
variables), you must remember to save your data fi le. This does not happen automati-
cally. If you don’t save regularly and there is a power blackout or you accidentally press
the wrong key (it does happen!), you will lose all of your work. So save yourself the
heartache and save regularly.
To save a fi le you are working on, go to the File menu (top left-hand corner) and
choose Save. Or, if you prefer, you can also click on the icon that looks like a fl oppy
disk, which appears on the toolbar at the top left of your screen. This will save your
le to whichever drive you are currently working on. This should always be the hard
drive—working from a fl ash drive is a recipe for disaster! I have had many students
come to me in tears after corrupting their data fi le by working from an external drive
rather than from the hard disk.
When you fi rst save a new data fi le, you will be asked to specify a name for the
le and to indicate a directory and a folder in which it will be stored. Choose the
directory and then type in a fi le name. SPSS will automatically give all data fi le names
the extension .sav. This is so that it can recognise it as a data fi le. Dont change this
extension, otherwise SPSS won’t be able to fi nd the fi le when you ask for it again later.
Opening a different data fi le
If you fi nish working on a data fi le and wish to open another one, click on File, select
Open, and then slide across to Data. Find the directory where your second fi le is
16 Getting Started
stored. Click on the desired fi le and then click the Open button. This will open the
second data fi le, while still leaving the fi rst data fi le open in a separate window. It
is a good idea to close fi les that you are not currently working on—it can get very
confusing having multiple fi les open.
Starting a new data fi le
Starting a new data fi le is easy. Click on File, then, from the drop-down menu, click
on New and then Data. From here you can start defi ning your variables and entering
your data. Before you can do this, however, you need to understand a little about
the windows and dialogue boxes that SPSS uses. These are discussed in the next
section.
WINDOWS
The main windows you will use in SPSS are the Data Editor, the Viewer, the Pivot
Table Editor, the Chart Editor and the Syntax Editor. These windows are summarised
here, but are discussed in more detail in later sections of this book.
When you begin to analyse your data, you will have a number of these windows
open at the same time. Some students fi nd this idea very confusing. Once you get the
hang of it, it is really quite simple. You will always have the Data Editor open because
this contains the data fi le that you are analysing. Once you start to do some analyses,
you will have the Viewer window open because this is where the results of all your
analyses are displayed, listed in the order in which you performed them.
The different windows are like pieces of paper on your desk—you can shuffl e
them around, so that sometimes one is on top and at other times another. Each of
the windows you have open will be listed along the bottom of your screen. To change
windows, just click on whichever window you would like to have on top. You can also
click on Window on the top menu bar. This will list all the open windows and allow
you to choose which you would like to display on the screen.
Sometimes the windows that SPSS displays do not initially fi ll the screen. It is
much easier to have the Viewer window (where your results are displayed) enlarged
on top, lling the entire screen. To do this, look on the top right-hand area of your
screen. There should be three little buttons or icons. Click on the middle button to
maximise that window (i.e. to make your current window fi ll the screen). If you wish
to shrink it again, just click on this middle button.
Data Editor window
The Data Editor window displays the contents of your data fi le, and in this window
you can open, save and close existing data fi les, create a new data fi le, enter data, make
changes to the existing data fi le, and run statistical analyses (see Figure 3.1).
Getting to know SPSS 17
Viewer window
When you start to do analyses, the Viewer window should open automatically (see
Figure 3.2). If it does not open automatically, click on Window from the menu and this
should be listed. This window displays the results of the analyses you have conducted,
including tables and charts. In this window you can modify the output, delete it, copy
it, save it, or even transfer it into a Word document.
The Viewer screen consists of two parts. On the left is an outline or navigation
pane, which gives you a full list of all the analyses you have conducted. You can use
this side to quickly navigate your way around your output (which can become very
long). Just click on the section you want to move to and it will appear on the right-
hand side of the screen. On the right-hand side of the Viewer window are the results
of your analyses, which can include tables and graphs (also referred to as charts in
SPSS).
Saving output
When you save the output from SPSS, it is saved in a separate fi le with a .spv exten-
sion, to distinguish it from data fi les, which have a .sav extension. If you are using a
version of SPSS prior to version 18, your output will be given a .spo extension. To
Figure 3.1
Example of a Data
Editor window
18 Getting Started
read these older fi les in SPSS Statistics 18, you will need to download a Legacy Viewer
program from the SPSS website.
To save the results of your analyses, you must have the Viewer window open on
the screen in front of you. Click on File from the menu at the top of the screen. Click
on Save. Choose the directory and folder in which you wish to save your output,
and then type in a fi le name that uniquely identifi es your output. Click on Save. To
name my fi les, I use an abbreviation that indicates the data fi le I am working on and
the date I conducted the analyses. For example, the fi le survey8may2009.spv would
contain the analyses I conducted on 8 May 2009 using the survey data fi le. I keep a log
book that contains a list of all my fi le names, along with details of the analyses that
were performed. This makes it much easier for me to retrieve the results of specifi c
analyses. When you begin your own research, you will fi nd that you can very quickly
accumulate a lot of different fi les containing the results of many different analyses.
To prevent confusion and frustration, get organised and keep good records of the
analyses you have done and of where you have saved the results.
It is important to note that the output fi le (with a .spv extension) can only be
opened in SPSS. This can be a problem if you, or someone that needs to read the
output, does not have SPSS available. To get around this problem, you may choose to
export’ your SPSS results. If you wish to save the entire output, select File from the
Figure 3.2
Example of Viewer
window
Getting to know SPSS 19
menu and then choose Export. You can choose the format that you would like to use
(e.g. pdf, Word/rtf). Saving as a Word/rtf le means that you will be able to modify the
tables in Word. Use the Browse button to identify the folder you wish to save the fi le
into, specify a suitable name in the Save File pop-up box that appears and then click
on Save and then OK.
If you don’t want to save the whole fi le, you can select specifi c parts of the output
to export. Select these in the Viewer window using the left-hand navigation pane.
With the selections highlighted, select File from the menu and choose Export. In the
Export Output dialog box you will need to tick the box at the top labelled Selected
and then select the format of the fi le and the location you wish to save to.
Printing output
You can use the navigation pane (left-hand side) of the Viewer window to select
particular sections of your results to print out. To do this, you need to highlight the
sections that you want. Click on the fi rst section you want, hold down the Ctrl key
on your keyboard and then just click on any other sections you want. To print these
sections, click on the File menu (from the top of your screen) and choose Print. SPSS
will ask whether you want to print your selected output or the whole output.
Pivot Table Editor window
The tables you see in the Viewer window (which SPSS calls pivot tables) can be
modifi ed to suit your needs. To modify a table you need to double-click on it, which
takes you into what is known as the Pivot Table Editor. You can use this editor to
change the look of your table, the size, the fonts used and the dimensions of the
columns—you can even swap the presentation of variables around (transpose rows
and columns).
If you click the right mouse button on a table in the Viewer window, a pop-up
menu of options that are specifi c to that table will appear. If you double-click on a
table and then click on your right mouse button even more options appear, including
the option to Create Graph using these results. You may need to highlight the part
of the table that you want to graph by holding down the Ctrl key while you select the
parts of the table you want to display.
Chart Editor window
When you ask SPSS to produce a histogram, bar graph or scatterplot, it initially displays
these in the Viewer window. If you wish to make changes to the type or presentation
of the chart, you need to go into the Chart Editor window by double-clicking on your
chart. In this window you can modify the appearance and format of your graph, change
the fonts, colours, patterns and line markers (see Figure 3.3). The procedure to generate
charts and to use the Chart Editor is discussed further in Chapter 7.
20 Getting Started
Syntax Editor window
In the good old days, all SPSS commands were given using a special command
language or syntax. SPSS still creates these sets of commands to run each of the
programs, but all you usually see are the Windows menus that ‘write’ the commands
for you. Although the options available through the SPSS menus are usually all that
most undergraduate students need to use, there are some situations when it is useful
to go behind the scenes and to take more control over the analyses that you wish to
conduct.
Syntax is a good way of keeping a record of what commands you have used,
particularly when you need to do a lot of recoding of variables or computing new
variables (demonstrated in Chapter 8). It is also useful when you need to repeat a lot
of analyses or generate a number of similar graphs.
You can use the normal SPSS menus to set up the basic commands of a partic-
ular statistical technique and then paste’ these to the Syntax Editor using a Paste
button provided with each procedure (see Figure 3.4). It allows you to copy and paste
commands, and to make modifi cations to the commands generated by SPSS. Quite
complex commands can also be written to allow more sophisticated recoding and
manipulation of the data. SPSS has a Command Syntax Reference under the Help
menu if you would like additional information. (Warning: this is not for beginners—
it is quite complex to follow.)
The commands pasted to the Syntax Editor are not executed until you choose to
run them. To run the command, highlight the specifi c command (making sure you
include the fi nal full stop), or select it from the left-hand side of the screen, and then
click on the Run menu option or the arrow icon from the menu bar. Extra comments
can be added to the syntax fi le by starting them with an asterisk (see Figure 3.4).
Syntax is stored in a separate text fi le with a .sps extension. Make sure you have
the syntax editor open in front of you and then select File from the menu. Select the
Save option from the drop-down menu, choose the location you wish to save the fi le
to and then type in a suitable fi le name. Click on the Save button.
The syntax fi le (with the extension .sps) can only be opened using SPSS. Some-
times it may be useful to copy and paste the syntax text from the Syntax Editor into
a Word document so that you (or others) can view it even if SPSS is not available. To
Figure 3.3
Example of a Chart
Editor window
Getting to know SPSS 21
do this, hold down the left mouse button and drag the cursor over the syntax you wish
to save. Choose Edit from the menu and then select Copy from the drop-down menu.
Open a Word document and paste this material using the Edit, Paste option or hold
the Ctrl key down and press V on the keyboard.
MENUS
Within each of the windows described above, SPSS provides you with quite a bewilder-
ing array of menu choices. These choices are displayed in drop-down menus across the
top of the screen, and also as icons. Try not to become overwhelmed; initially, just learn
the key ones, and as you get a bit more confi dent you can experiment with others.
DIALOGUE BOXES
Once you select a menu option, you will usually be asked for further information. This
is done in a dialogue box. Figure 3.5 shows the dialogue box that appears when you
use the Frequencies procedure to get some descriptive statistics. To see this, click on
Analyze from the menu at the top of the screen, and then select Descriptive Statistics
and then slide across and select Frequencies. This will display a dialogue box asking
you to nominate which variables you want to use (see Figure 3.5).
Selecting variables in a dialogue box
To indicate which variables you want to use you need to highlight the selected vari-
ables in the list provided (by clicking on them), then click on the arrow button in
Figure 3.4
Example of a Syntax
Editor window
22 Getting Started
the centre of the screen to move them into the empty box labelled Variable(s). You
can select variables one at a time, clicking on the arrow each time, or you can select a
group of variables. If the variables you want to select are all listed together, just click
on the fi rst one, hold down the Shift key on your keyboard and press the down arrow
key until you have highlighted all the desired variables. Click on the arrow button and
all of the selected variables will move across into the Variable(s) box.
If the variables you want to select are spread throughout the variable list, you should
click on the fi rst variable you want, hold down the Ctrl key, move the cursor down to
the next variable you want and then click on it, and so on. Once you have all the desired
variables highlighted, click on the arrow button. They will move into the box.
To remove a variable from the box, you just reverse the process. Click on the
variable in the Variable(s) box that you wish to remove, click on the arrow button,
and it shifts the variable back into the original list. You will notice the direction of the
arrow button changes, depending on whether you are moving variables into or out of
the Variable(s) box.
Dialogue box buttons
In most dialogue boxes you will notice a number of standard buttons (OK, Paste,
Reset, Cancel and Help; see Figure 3.5). The uses of each of these buttons are:
• OK: Click on this button when you have selected your variables and are ready to
run the analysis or procedure.
Figure 3.5
Example of a
Frequencies
dialogue box
Getting to know SPSS 23
• Paste: This button is used to transfer the commands that SPSS has generated in
this dialogue box to the Syntax Editor. This is useful if you wish to keep a record
of the command or repeat an analysis a number of times.
• Reset: This button is used to clear the dialogue box of all the previous commands
you might have given when you last used this particular statistical technique or
procedure. It gives you a clean slate to perform a new analysis, with different
variables.
• Cancel: Clicking on this button closes the dialogue box and cancels all of the
commands you may have given in relation to that technique or procedure.
• Help: Click on this button to obtain information about the technique or
procedure you are about to perform.
Although I have illustrated the use of dialogue boxes in Figure 3.5 by using Frequen-
cies, all dialogue boxes work on the same basic principle. Each will have a series of
buttons with a variety of options relating to the specifi c procedure or analysis. These
buttons will open subdialogue boxes that allow you to specify which analyses you wish
to conduct or which statistics you would like displayed.
CLOSING SPSS
When you have fi nished your SPSS session and wish to close the program down, click
on the File menu at the top left of the screen. Click on Exit. SPSS will prompt you to
save your data fi le and a fi le that contains your output. You should not rely on the fact
that SPSS will prompt you to save when closing the program. It is important that you
save both your output and your data fi le regularly throughout your session. If SPSS
crashes or there is a power cut you will lose all your work.
GETTING HELP
If you need help while using SPSS or don’t know what some of the options refer to,
you can use the in-built Help menu. Click on Help from the menu bar and a number
of choices are offered. You can ask for specifi c topics, work through a Tutorial, or
consult a Statistics Coach. This takes you step by step through the decision-making
process involved in choosing the right statistic to use. This is not designed to replace
your statistics books, but it may prove a useful guide.
Within each of the major dialogue boxes there is an additional Help menu that
will assist you with the procedure you have selected.
This page intentionally left blank
PART TWO
Preparing the
data fi le
Preparation of the data fi le for analysis involves a number of steps. These include
creating the data fi le and entering the information obtained from your study in a
format defi ned by your codebook (covered in Chapter 2). The data fi le then needs to
be checked for errors, and these errors corrected. Part Two of this book covers these
two steps. In Chapter 4, the procedures required to create a data fi le and enter the
data are discussed. In Chapter 5, the process of screening and cleaning the data fi le is
covered.
25
This page intentionally left blank
27
4
Creating a data fi le and
entering data
There are a number of stages in the process of setting up a data fi le and analysing the
data. The fl ow chart shown on the next page outlines the main steps that are needed.
In this chapter I will lead you through the process of creating a data fi le and entering
the data.
To prepare a data fi le, three key steps are covered in this chapter:
• Step 1. The fi rst step is to check and modify, where necessary, the options that
SPSS uses to display the data and the output that is produced.
• Step 2. The next step is to set up the structure of the data fi le by ‘defi ning’ the
variables.
• Step 3. The fi nal step is to enter the data—that is, the values obtained from each
participant or respondent for each variable.
To illustrate these procedures I have used the data fi le survey4ED.sav, which is
described in the Appendix. The codebook used to generate these data is also provided
in the Appendix.
Data fi les can also be ‘imported’ from other spreadsheet-type programs (e.g.
Excel). This can make the data entry process much more convenient, particularly
for students who don’t have SPSS on their home computers. You can set up a basic
data fi le on Excel and enter the data at home. When complete, you can then import
the fi le into SPSS and proceed with the data manipulation and data analysis stages.
The instructions for using Excel to enter the data are provided later in this chapter.
28 Preparing the data le
CHANGING THE SPSS ‘OPTIONS’
Before you set up your data fi le, it is a good idea to check the SPSS options that govern
the way your data and output are displayed. The options allow you to defi ne how your
variables will be displayed, the type of tables that will be displayed in the output and
many other aspects of the program. Some of this will seem confusing at fi rst, but once
you have used the program to enter data and run some analyses you may want to refer
back to this section.
Flow chart of data analysis process
Prepare codebook (Chapter 2)
Set up structure of data fi le (Chapter 4)
Enter data (Chapter 4)
Screen data fi le for errors (Chapter 5)
Explore data using descriptive statistics and graphs (Chapters 6 and 7)
Modify variables for further analyses (Chapter 8)
Conduct statistical analyses to Conduct statistical analyses to
explore relationships (Part 4) compare groups (Part 5)
Correlation (Chapter 11) Non-parametric techniques (Chapter 16)
Partial correlation (Chapter 12) T-tests (Chapter 17)
Multiple regression (Chapter 13) Analysis of variance (Chapters 18, 19, 20)
Logistic regression (Chapter 14) Multivariate analysis of variance (Chapter 21)
Factor analysis (Chapter 15) Analysis of covariance (Chapter 22)
Creating a data fi le and entering data 29
If you are sharing a computer with other people (e.g. in a computer lab), it is worth
being aware of these options. Sometimes other students will change these options,
which can dramatically infl uence how the program appears. It is useful to know how
to change things back to the way you want them.
To open the Options screen, click on Edit from the menu at the top of the screen
and then choose Options. The screen shown in Figure 4.1 should appear. There are
a lot of choices listed, many of which you wont need to change. I have described the
key ones below, organised by the tab they appear under. To move between the various
tabs, just click on the one you want. Dont click on OK until you have fi nished all the
changes you want to make, across all the tabs.
General tab
When you come to do your analyses, you can ask for your variables to be listed in
alphabetical order or by the order in which they appear in the fi le. I always use the Figure 4.1
Example of an
Options screen
30 Preparing the data le
le order, because this is consistent with the order of the questionnaire items and the
codebook. To keep the variables in fi le order, just make sure the option File in
the Variable Lists section is selected.
In the Output section on the right-hand side, place a tick in the box No scientifi c
notation for small numbers in tables. This will stop you getting some very strange
numbers in your output for the statistical analyses. In the Noti cation section, make
sure the options Raise viewer window and Scroll to new output are selected. This
means that when you conduct an analysis the Viewer window will appear, and the
new output will be displayed on the screen.
Data tab
Click on the Data tab to make changes to the way that your data fi le is displayed. If
your variables do not involve values with decimal places, you may like to change the
display format for all your variables. In the section labelled Display Format for New
Numeric Variables, change the Decimal Places value to 0. This means that all new
variables will not display any decimal places. This reduces the size of your data fi le and
simplifi es its appearance.
Output Labels tab
The options in this section allow you to customise how you want the variable names
and value labels displayed in your output. In the very bottom section under Variable
values in labels are shown as: choose Values and Labels from the drop-down options.
This will allow you to see both the numerical values and the explanatory labels in the
tables that are generated in the Viewer window.
Charts tab
Click on the Charts tab if you wish to change the appearance of your charts. You can
alter the Chart Aspect Ratio if you wish. You can also make other changes to the way
in which the chart is displayed (e.g. font, colour, lines).
Pivot Tables tab
SPSS presents most of the results of the statistical analyses in tables called pivot tables.
Under the Pivot Tables tab you can choose the format of these tables from an extensive
list. It is a matter of experimenting to fi nd a style that best suits your needs. When I am
rst doing my analyses, I use a style called CompactBoxed. This saves space (and paper
when printing). However, this style is not suitable for importing into documents that
are being sent for publication in a journal because it includes vertical lines. The styles
listed as Academic’ may be more suitable here as they do not use vertical lines.
You can change the table styles as often as you like—just remember that you have
to change the style before you run the analysis. You cannot change the style of the
Creating a data fi le and entering data 31
tables after they appear in your output, but you can modify many aspects (e.g. font
sizes, column width) by using the Pivot Table Editor. This can be activated by double-
clicking on the table that you wish to modify.
Once you have made all the changes you wish to make on the various Options
tabs, click on OK. You can then proceed to defi ne your variables and enter your data.
DEFINING THE VARIABLES
Before you can enter your data, you need to tell SPSS about your variable names and
coding instructions. This is called defi ning the variables’. You will do this in the Data
Editor window (see Figure 4.2). The Data Editor window consists of two different
views: Data View and Variable View. You can move between these two views using the
little tabs at the bottom left-hand side of the screen.
You will notice that in the Data View window each of the columns is labelled var.
These will be replaced with the variable names that you listed in your codebook (see
Figure 4.2). Down the side you will see the numbers 1, 2, 3 and so on. These are the
case numbers that SPSS assigns to each of your lines of data. These are not the same
as your ID numbers, and these case numbers change if you sort your fi le or split your
le to analyse subsets of your data.
Procedure for defi ning your variables
To de ne each of the variables that make up your data fi le, you fi rst need to click
on the Variable View tab at the bottom left of your screen. In this view (see
Figure 4.3) the variables are listed down the side, with their characteristics listed
along the top (name, type, width, decimals, label etc.).
Your job now is to defi ne each of your variables by specifying the required infor-
mation for each variable listed in your codebook. Some of the information you will
need to provide yourself (e.g. name); other bits are provided automatically using
Figure 4.2
Data Editor window
32 Preparing the data le
default values. These default values can be changed if necessary. The key pieces of infor-
mation that are needed are described below. The headings I have used correspond to
the column headings displayed in the Variable View. I have provided the simple step-
by-step procedures below; however, there are a number of shortcuts that you can use
once you are comfortable with the process. These are listed later, in the section headed
‘Optional shortcuts. You should become familiar with the basic techniques fi rst.
Name
In this column, type in the brief variable name that will be used to identify each of the
variables in the data fi le (listed in your codebook). Keep these variable names as short as
possible, not exceeding 64 characters. They must follow the naming conventions speci-
ed by SPSS (listed in Chapter 2). Each variable name must be unique, must start with
a letter, and cannot contain spaces or symbols. For ideas on how to label your variables,
have a look at the codebooks provided in the Appendix. These list the variable names
used in data fi les that accompany this book (see p. viii for details of these fi les).
Type
The default value for Type that will appear automatically as you enter your fi rst variable
name is Numeric. For most purposes, this is all you will need to use. There are some
circumstances where other options may be appropriate. For example, if you need to enter
text information (e.g. a persons surname), you need to change the type to String. A Date
option is also available if your data includes dates. To change the variable type, click in the
cell and a box with three dots should appear giving you the options available. You can also
use this window to adjust the width of the variable and the number of decimal places.
Width
The default value for Width is 8. This is usually suffi cient for most data. If your
variable has very large values (or you have requested a string variable), you may need
to change this default value; otherwise, leave it as is.
Figure 4.3
Variable View
Creating a data fi le and entering data 33
Decimals
The default value for Decimals is usually 2 (however, this can be changed using the
Options facility described earlier in this chapter). If your variable has decimal places,
change this to suit your needs.
Label
The Label column allows you to provide a longer description for your variable than
used in the Name column. This will be used in the output generated from the analyses
conducted by SPSS. For example, you may wish to give the label Total Optimism to
your variable TOPTIM.
Values
In the Values column you can defi ne the meaning of the values you have used to code
your variables. I will demonstrate this process for the variable Sex.
1. Click on the three dots on the right-hand side of the cell. This opens the
Value Label dialogue box.
2. Click in the box marked Value. Type in 1.
3. Click in the box marked Label. Type in Male.
4. Click on Add. You will then see in the summary box: 1=Male.
5. Repeat for Females: Value: enter 2, Label: enter Female. Add.
6. When you have fi nished defi ning all the possible values (as listed in your
codebook), click on OK.
Missing
Sometimes researchers assign specifi c values to indicate missing values for their data.
This is not essential—SPSS will recognise any blank cell as missing data. So if you
intend to leave a blank when a piece of information is not available, it is not necessary
to do anything with this Variable View column.
If you do intend to use specifi c missing value codes (e.g. 99=not applicable), you
must specify this value in the Missing section, otherwise SPSS will use the value as a
legitimate value in any statistical analyses. Click in the cell and then on the shaded box
with three dots that appears. Choose the option Discrete missing values and type the
value (e.g. 99) in the space provided. Up to three values can be specifi ed. Click on OK.
If you are using these special codes, it is also a good idea to go back and label these
values in the Values column.
Columns
The default column width is usually set at 8, which is suffi cient for most purposes.
Change it only if necessary to accommodate your values or long variable names.
34 Preparing the data le
Align
The alignment of the columns is usually set at right’ alignment. There is no need to
change this.
Measure
The column heading Measure refers to the level of measurement of each of your vari-
ables. The default is Scale, which refers to continuous data measured at interval or ratio
level of measurement. If your variable consists of categories (e.g. sex), click in the cell and
then on the arrow key that appears. Choose Nominal for categorical data and Ordinal if
your data involve rankings or ordered values (e.g. level of education completed).
Optional shortcuts
The process described above can be rather tedious if you have a large number of
variables in your data fi le. There are a number of shortcuts you can use to speed up
the process. If you have a number of variables that have the same attributes (e.g.
type, width, decimals), you can set the fi rst variable up correctly and then copy these
attributes to one or more other variables.
Copying variable defi nition attributes to one other variable
1. In Variable View, click on the cell that has the attribute you wish to copy
(e.g. Width).
2. From the menu, click on Edit and then Copy.
3. Click on the same attribute cell for the variable you wish to apply this to.
4. From the menu, click on Edit and then Paste.
Copying variable defi nition attributes to a number of other
variables
1. In Variable View, click on the cell that has the attribute you wish to copy
(e.g. Width).
2. From the menu, click on Edit and then Copy.
3. Click on the same attribute cell for the fi rst variable you wish to copy to
and then, holding your left mouse button down, drag the cursor down
the column to highlight all the variables you wish to copy to.
4. From the menu, click on Edit and then Paste.
Setting up a series of new variables all with the same attributes
If your data consists of scales made up of a number of individual items, you can
create the new variables and defi ne the attributes of all of these items in one go. The
Creating a data fi le and entering data 35
procedure is detailed below, using the six items of the Optimism Scale as an example
(optim1 to optim6). If you want to practise this as an exercise, you should start a new
data fi le (File, New, Data).
1. In Variable View, defi ne the attributes of the fi rst variable (optim1)
following the instructions provided earlier. This would involve defi ning
the value labels 1=strongly disagree, 2=disagree, 3=neutral, 4=agree,
5=strongly agree.
2. With the Variable View selected, click on the row number of this variable
(this should highlight the whole row).
3. From the menu, select Edit and then Copy.
4. Click on the row number of the next empty row.
5. From the menu, select Edit and then Paste Variables.
6. In the dialogue box that appears, enter the number of additional
variables you want to add (in this case, 5). Enter the prefi x you wish to
use (optim) and the number you wish the new variables to start on (in this
case, 2). Click on OK.
This will give you fi ve new variables (optim2, optim3, optim4, optim5 and optim6).
To set up all of the items in other scales, just repeat the process detailed above
(e.g. sest1 to sest10 for the self-esteem items). Remember, this procedure is suitable
only for items that have all the same attributes; it is not appropriate if the items have
different response scales (e.g. if some are categorical and others continuous), or if the
values are coded differently.
ENTERING DATA
Once you have defi ned each of your variable names and given them value labels (where
appropriate), you are ready to enter your data. Make sure you have your codebook ready.
Procedure for entering data
1. To enter data, you need to have the Data View active. Click on the Data
View tab at the bottom left-hand side of the screen of the Data Editor
window. A spreadsheet should appear with your newly defi ned variable
names listed across the top.
2. Click on the fi rst cell of the data set (fi rst column, fi rst row).
3. Type in the number (if this variable is ID, this should be 1).
4. Press the right arrow key on your keyboard; this will move the cursor into
the second cell, ready to enter your second piece of information for case
number 1.
36 Preparing the data le
5. Move across the row, entering all the information for case 1, making sure
that the values are entered in the correct columns.
6. To move back to the start, press the Home key on your keyboard (on some
computers you may need to hold the Ctrl key or the Fn key down and
then press the Home key). Press the down arrow to move to the second
row, and enter the data for case 2.
7. If you make a mistake and wish to change a value, click in the cell that
contains the error. Type in the correct value and then press the right
arrow key.
After you have defi ned your variables and entered your data, your Data Editor window
should look something like that shown previously in Figure 3.1.
If you have entered value labels for some of your variables (e.g. Sex: 1=male,
2=female), you can choose to have these labels displayed in the Data Editor window
instead of just the numbers. To do this, click on View from the menu and select the
option Va lu e Labels. This option can also be activated during the data entry process so
that you can choose an option from a drop-down menu, rather than typing a number
in each cell. This is slower, but does ensure that only valid numbers are entered. To
turn this option off, go to View and click on Value Labels again to remove the tick.
MODIFYING THE DATA FILE
After you have created a data fi le, you may need to make changes to it (e.g. to add,
delete or move variables, or to add or delete cases). Make sure you have the Data
Editor window open on the screen, showing Data View.
Delete a case
Move down to the case (row) you wish to delete. Position your cursor in the shaded
section on the left-hand side that displays the case number. Click once to highlight
the row. Press the Delete button on your computer keyboard. You can also click on the
Edit menu and click on Clear.
Insert a case between existing cases
Move your cursor to a cell in the case (row) immediately below where you would like
the new case to appear. Click on the Edit menu and choose Insert Cases. An empty
row will appear in which you can enter the data of the new case.
Delete a variable
Position your cursor in the shaded section (which contains the variable name) above the
column you wish to delete. Click once to highlight the whole column. Press the Delete
button on your keyboard. You can also click on the Edit menu and click on Clear.
Creating a data fi le and entering data 37
Insert a variable between existing variables
Position your cursor in a cell in the column (variable) to the right of where you would
like the new variable to appear. Click on the Edit menu and choose Insert Variable.
An empty column will appear in which you can enter the data of the new variable.
Move an existing variable(s)
In the Data Editor window, have the Variable View showing. Highlight the variable
you wish to move by clicking in the left-hand margin. Click and hold your left mouse
button and then drag the variable to the new position (a red line will appear as you
drag). Release the left mouse button when you get to the desired spot.
DATA ENTRY USING EXCEL
Data fi les can be prepared in the Microsoft Excel program and then imported into
SPSS for analysis. This is great for students who don’t have access to SPSS at
home. Excel usually comes as part of the Microsoft Offi ce package—check under All
Programs in your Start menu. The procedure for creating a data fi le in Excel and
then importing it into SPSS is described below. If you intend to use this option you
should have at least a basic understanding of Excel, as this will not be covered here.
Warning: Excel can cope with only 256 columns of data (or variables). If your data
le is likely to be larger than this, it is probably easier to set it up in SPSS rather than
convert from Excel to SPSS later. Alternatively, you can use different Excel spread-
sheets (each with the ID as the fi rst variable), convert each to SPSS separately, then
merge the fi les in SPSS later (see instructions in the next section).
Step 1: Set up the variable names
Set up an Excel spreadsheet with the variable names in the fi rst row across
the page. The variable names must conform to the SPSS rules for naming
variables (see Chapter 2).
Step 2: Enter the data
1. Enter the information for the fi rst case on one line across the page, using
the appropriate columns for each variable.
2. Repeat for each of the remaining cases. Don’t use any formulas or other
Excel functions. Remember to save your fi le regularly.
3. Click on File, Save. In the section marked Save as Type, make sure
Microsoft Excel Workbook is selected. Type in an appropriate fi le name.
38 Preparing the data le
Step 3: Converting to SPSS
1. After you have entered the data, save your fi le and then close Excel.
2. Start SPSS and select File, Open, Data from the menu at the top of the
screen.
3. In the section labelled Files of type, choose Excel. Excel fi les have a .xls or
.xlsx extension. Find the fi le that contains your data. Click on it so that it
appears in the File name section.
4. Click on the Open button. A screen will appear labelled Opening Excel
Data Source. Make sure there is a tick in the box Read variable names
from the fi rst row of data. Click on OK.
The data will appear on the screen with the variable names listed across the top. You
will then need to save this new SPSS fi le.
Step 4: Saving as an SPSS fi le
1. Choose File, and then Save As from the menu at the top of the screen.
2. Type in a suitable fi le name. Make sure that the Save as Type is set at
SPSS Statistics (*.sav). Click on Save.
3. In the Data Editor, Variable view, you will now need to defi ne each of
the Labels, Values and Measure information (see instructions presented
earlier). You may also want to reduce the width of the columns as they
often come in from Excel with a width of 11.
When you wish to open this fi le later to analyse your data using SPSS, make sure you
choose the fi le that has a .sav extension (not your original Excel fi le that has a .xls
extension).
MERGE FILES
There are times when it is necessary to merge different data fi les. SPSS allows you to
merge fi les by adding additional cases at the end of your fi le, or to merge additional
variables for each of the cases in an existing data fi le (e.g. when Time 2 data becomes
available). This second option is also useful when you have Excel fi les with infor-
mation spread across different spreadsheets that need to be merged by ID.
To merge fi les by adding cases
This procedure will allow you to merge fi les that have the same variables, but differ-
ent cases; for example, where the same information is recorded at two different sites
Creating a data fi le and entering data 39
(e.g. clinic settings) or entered by two different people. The two fi les should have the
same variable names for the data you wish to merge (although other non-equivalent
information can exist in each fi le).
If the ID numbers used in each fi le are the same (starting at ID=1, 2, 3), you will
need to change the ID numbers in one of the fi les before merging so that each case
is still uniquely identifi ed. To do this, open one of the fi les, choose Transform from
the menu, and then Compute Variable. Type ID in the Target Variable box, and then
ID + 1000 in the Numeric Expression box (or some number that is bigger than the
number of cases in the fi le). Click on the OK button, and then on OK in the dialogue
box that asks if you wish to change the variable. This will create new ID numbers
for this fi le starting at 1001, 1002 and so on. Note this in your codebook for future
reference. Then you are ready to merge the fi les.
1. Open the fi rst le that you wish to merge.
2. Go to the Data menu, choose Merge Files and then Add Cases.
3. In the dialogue box, click on An external SPSS data fi le and choose the
le that you wish to merge with. (If your second fi le is already open it will
be listed in the top box, An open dataset.)
4. Click on Continue and then on OK. Save the new data fi le using a
different name (File, Save As).
To merge fi les by adding variables
This option is useful when adding additional information for each case (with the
matching IDs). Each fi le must start with the ID number.
1. Sort each fi le in ascending order by ID by clicking on the Data menu,
choose Sort Cases and choose ID.
2. Go to the Data menu, choose Merge fi les and then Add Variables.
3. In the dialogue box, click on An external SPSS data fi le and choose the
le that you wish to merge with. (If your second fi le is already open it will
be listed in the top box, An open dataset.)
4. In the Excluded variables box, you should see the ID variable listed
(because it exists in both data fi les). (If you have any other variables listed
here, you will need to click on the Rename button to change the variable
name so that it is unique.)
5. Click on the ID variable, and then on the box Match cases on key variables
and on the arrow button to move ID into the Key Variables box. This means
that all information will be matched by ID. Click on Continue and then OK.
6. Save your merged fi le under a different name (File, Save As).
40 Preparing the data le
USEFUL SPSS FEATURES
There are many useful features of SPSS that can be used to help with analyses, and to
save you time and effort. I have highlighted a few of the main ones in the following
sections.
Sort the data fi le
You can ask SPSS to sort your data fi le according to values on one of your variables
(e.g. sex, age).
1. Click on the Data menu, choose Sort Cases and specify which variable will
be used to sort by. Choose either Ascending or Descending. Click on OK.
2. To return your fi le to its original order repeat the process, asking SPSS to
sort the fi le by ID.
Split the data fi le
Sometimes it is necessary to split your fi le and to repeat analyses for groups (e.g. males and
females) separately. This procedure does not physically alter your fi le in any permanent
manner. It is an option you can turn on and off as it suits your purposes. The order in
which the cases are displayed in the data fi le will change, however. You can return the data
le to its original order (by ID) by using the Sort Cases command described above.
1. Click on the Data menu and choose the Split File option.
2. Click on Compare groups and specify the grouping variable (e.g. sex).
Click on OK.
For the analyses that you perform after this split fi le procedure, the two groups (in this
case, males and females) will be analysed separately.
Important: when you have fi nished the analyses, you need to go back and turn the
Split File option off.
1. Click on the Data menu and choose the Split File option.
2. Click on the fi rst dot (Analyze all cases, do not create groups). Click on OK.
Select cases
For some analyses, you may wish to select a subset of your sample (e.g. only males).
1. Click on the Data menu and choose the Select Cases option.
2. Click on the If condition is satisfi ed button.
3. Click on the button labelled IF.
Creating a data fi le and entering data 41
4. Choose the variable that defi nes the group that you are interested in (e.g. sex).
5. Click on the arrow button to move the variable name into the box. Click
on the = key from the keypad displayed on the screen.
6. Type in the value that corresponds to the group you are interested in
(check with your codebook). For example, males in this sample are coded
1, therefore you would type in 1. The command line should read: sex=1.
7. Click on Continue and then OK.
For the analyses (e.g. correlation) that you perform after this Select Cases procedure,
only the group that you selected (e.g. males) will be included.
Important: when you have fi nished the analyses, you need to go back and turn the
Select Cases option off, otherwise it will apply to all analyses conducted.
1. Click on the Data menu and choose Select Cases option.
2. Click on the fi rst All cases option. Click on OK.
USING SETS
With large data fi les, it can be a pain to have to scroll through lots of variable names
in SPSS dialogue boxes to reach the ones that you want to analyse. SPSS allows you to
defi ne and use ‘sets of variables. This is particularly useful in the survey4ED.sav data
le, where there are lots of individual items that are added to give total scores, which
are located at the end of the fi le. In the following example, I will establish a set that
includes only the demographic variables and the scale totals.
1. Click on Utilities from the menu and choose Defi ne Variable Sets.
2. Choose the variables you want in your set from the list. Include ID, the
demographic variables (sex through to smoke number), and then all the
totals at the end of the data fi le from Total Optimism onwards. Move
these into the Variables in Set box.
3. In the box Set Name, type an appropriate name for your set (e.g. Totals).
4. Click on the Add Set button and then on Close.
To use the sets you have created, you need to activate them.
1. Click on Utilities and on Use Variable Sets.
2. In the list of variable sets, tick the set you have created (Totals) and then
go up and untick the ALLVARIABLES option, as this would display all
variables. Leave NEWVARIABLES ticked. Click on OK.
42 Preparing the data le
With the sets activated, only the selected variables will be displayed in the data fi le and
in the dialogue boxes used to conduct statistical analyses.
To turn the option off
1. Click on Utilities and on Use Variable Sets.
2. Tick the ALLVARIABLES option and click OK.
Data fi le comments
Under the Utilities menu, SPSS provides you with the chance to save descriptive
comments with a data fi le.
1. Select Utilities and Data File Comments.
2. Type in your comments, and if you would like them recorded in the
output fi le, click on the option Display comments in output. Comments
are saved with the date they were made.
Display values labels in data fi le
When the data fi le is displayed in the Data Editor window, the numerical values for
all variables are usually shown. If you would like the value labels (e.g. male, female)
displayed instead, go to the View menu and choose Value Labels. To turn this option
off, go to the View menu and click on Value Labels again to remove the tick.
43
5
Screening and
cleaning the data
Before you start to analyse your data, it is essential that you check your data set for errors.
It is very easy to make mistakes when entering data, and unfortunately some errors can
completely mess up your analyses. For example, entering 35 when you mean to enter 3
can distort the results of a correlation analysis. Some analyses are very sensitive to what
are known as outliers’; that is, values that are well below or well above the other scores.
So it is important to spend the time checking for mistakes initially, rather than trying to
repair the damage later. Although boring, and a threat to your eyesight if you have large
data sets, this process is essential and will save you a lot of heartache later!
The data screening process involves a number of steps:
Step 1: Checking for errors. First, you need to check each of your variables for
scores that are out of range (i.e. not within the range of possible scores).
Step 2: Finding and correcting the error in the data fi le. Second, you need to fi nd
where in the data fi le this error occurred (i.e. which case is involved) and correct
or delete the value.
To give you the chance to practise these steps, I have created a modifi ed data fi le
(error4ED.sav) provided on the website accompanying this book (this is based on the
main fi le survey4ED.sav—see details on p. viii and in the Appendix). To follow along,
you will need to start SPSS and open the error4ED.sav le. In working through each of
the steps on the computer you will become more familiar with the use of menus, inter-
preting the output from SPSS analyses and manipulating your data fi le. For each of the
procedures, I have included the SPSS syntax. For more information on the use of the
Syntax Editor for recording and saving the SPSS commands, see Chapter 3.
Before you start, you should go to the Edit menu and choose Options. Under the
Output Labels tab, go down to the fi nal box (Variable values in labels shown as:) and
choose Values and Labels. This will allow you to display both the values and labels
used for each of your categorical variables—making identifi cation of errors easier.
44 Preparing the data le
STEP 1: CHECKING FOR ERRORS
When checking for errors, you are primarily looking for values that fall outside the
range of possible values for a variable. For example, if sex is coded 1=male, 2=female,
you should not fi nd any scores other than 1 or 2 for this variable. Scores that fall
outside the possible range can distort your statistical analyses—so it is very important
that all these errors are corrected before you start. To check for errors, you will need
to inspect the frequencies for each of your variables. This includes all of the individual
items that make up the scales. Errors must be corrected before total scores for these
scales are calculated. It is a good idea to keep a log book where you record any errors
that you detect and any changes that you make to your data fi le.
There are a number of different ways to check for errors using SPSS. I will illus-
trate two different ways, one that is more suitable for categorical variables (e.g. sex)
and the other for continuous variables (e.g. age).
Checking categorical variables
In this section, the procedure for checking categorical variables for errors is presented.
In the example shown below, I will illustrate the process using the error4ED.sav
data fi le (included on the website accompanying this book—see p. viii), checking for
errors on the variables Sex, Marital status and Highest education completed. Some
deliberate errors have been introduced in the error4ED.sav data fi le so that you
can get practice spotting them—they are not present in the main survey4ED.sav
data fi le.
Procedure for checking categorical variables
1. From the main menu at the top of the screen, click on Analyze, then click
on Descriptive Statistics, then Frequencies.
2. Choose the variables that you wish to check (e.g. sex, marital, educ.).
3. Click on the arrow button to move these into the Variable box.
4. Click on the Statistics button. Tick Minimum and Maximum in the
Dispersion section.
5. Click on Continue and then on OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
FREQUENCIES
VARIABLES=sex marital educ
/STATISTICS=MINIMUM MAXIMUM
/ORDER= ANALYSIS .
Screening and cleaning the data 45
Selected output generated using this procedure is displayed as follows.
There are two parts to the output. The fi rst table provides a summary of each of the
variables you requested. The remaining tables give you a breakdown, for each variable,
of the range of responses. (These are listed using the value label and the code number
that was used if you changed the Options as suggested earlier in this chapter.)
46 Preparing the data le
• Check your Minimum and Maximum values. Do they make sense? Are they
within the range of possible scores on that variable? You can see from the fi rst
table (labelled Statistics) that, for the variable Sex, the minimum value is 1 and
the maximum is 3. This value is incorrect, as the maximum value should only be
2 according to the codebook in the Appendix. For marital status, the scores are
within the appropriate range of 1 to 8. The maximum value for highest educ is 22,
indicating an error, as the maximum value should only be 6.
Check the number of Valid and Missing cases. If there are a lot of missing cases, you
need to ask why. Have you made errors in entering the data (e.g. put the data in the
wrong columns)? Sometimes extra cases appear at the bottom of the data fi le, where
you may have moved your cursor too far down and accidentally created some empty’
cases. If this occurs, open your Data Editor window, move down to the empty case
row, click in the shaded area where the case number appears and press Delete on your
keyboard. Rerun the Frequencies procedure again to get the correct values.
Other tables are also presented in the output, corresponding to each of the vari-
ables that were investigated. In these tables, you can see how many cases fell into
each of the legitimate categories. It also shows how many cases have out-of-range
values. There is one case with a value of 3 for sex, and one person with a value of
22 for education. We will need to fi nd out where these errors occurred, but fi rst
we will demonstrate how to check for errors in some of the continuous variables
in the data fi le.
Checking continuous variables
Procedure for checking continuous variables
1. From the menu at the top of the screen, click on Analyze, then click on
Descriptive statistics, then Descriptives.
2. Click on the variables that you wish to check. Click on the arrow button to
move them into the Variables box (e.g. age).
3. Click on the Options button. You can ask for a range of statistics. The
main ones at this stage are mean, standard deviation, minimum and
maximum. Click on the statistics you wish to generate.
4. Click on Continue, and then on OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
DESCRIPTIVES
VARIABLES=age
/STATISTICS=MEAN STDDEV MIN MAX .
Screening and cleaning the data 47
The output generated from this procedure is shown as follows.
• Check the Minimum and Maximum values. Do these make sense? In this case, the
ages range from 2 to 82. The minimum value suggests an error (given this was an
adult-only sample).
• Does the Mean score make sense? If there is an out-of-range value in the data fi le,
this will distort the mean value. If the variable is the total score on a scale, is the
mean value what you expected from previous research on this scale?
STEP 2: FINDING AND CORRECTING THE ERROR IN THE
DATA FILE
So what do we do if we fi nd some out-of-range responses (e.g. a value of 3 for sex)?
First, we need to fi nd the error in the data fi le. Dont try to scan through your entire
data set looking for the error—there are a number of different ways to fi nd an error
in a data fi le. I will illustrate two approaches.
Method 1
1. Click on the Data menu and choose Sort Cases.
2. In the dialogue box that pops up, click on the variable that you know
has an error (e.g. sex) and then on the arrow to move it into the Sort By
box. Click on either ascending or descending (depending on whether you
want the higher values at the top or the bottom). For sex, we want to
nd the person with the value of 3, so we would choose descending.
3. Click on OK.
In the Data Editor window, make sure that you have selected the Data View tab
so that you can see your data values. The case with the error for your selected variable
(e.g. sex) should now be located at the top of your data fi le. Look across to the sex
variable column. In this example, you will see that the fi rst case listed (ID=103) has
a value of 3 for sex. If this was your data, you would need to access the original ques-
tionnaires and check whether the person with an identifi cation number of 103 was
a male or female. You would then delete the value of 3 and type in the correct value.
Record this information in your log book. If you dont have access to the original data,
48 Preparing the data le
you should delete the value and let SPSS replace it with the system missing code (it
will show as a full stop—this happens automatically, dont type a full stop).
When you fi nd an error in your data fi le, it is important that you check for other
errors in the surrounding columns. In this example, notice that the inappropriate
value of 2 for age is also for person ID=103.
Shown below is another way that we could have found the case that had an error
for sex.
Method 2
1. Make sure that the Data Editor window is open and on the screen with
the data showing.
2. Click on the variable name in which the error has occurred (e.g. sex).
3. Click once to highlight the column.
4. Click on Edit from the menu across the top of the screen. Click on Find.
5. In the Find box, type in the incorrect value that you are looking for (e.g. 3).
6. Click on Find Next. SPSS will scan through the fi le and will stop at the fi rst
occurrence of the value that you specifi ed. Take note of the ID number of
this case (from the fi rst column). You will need this to check your records
or questionnaires to fi nd out what the value should be.
7. Click on Find Next again if you need to continue searching for other
cases with the same incorrect value. In this example, we know from the
Frequencies output that there is only one incorrect value of 3.
8. Click on Close when you have fi nished searching.
After you have corrected your errors, it is essential to repeat Frequencies to double-
check. Sometimes, in correcting one error you may have accidentally caused another
error. Although this process is tedious, it is very important that you start with a clean,
error-free data set. The success of your research depends on it. Don’t cut corners!
CASE SUMMARIES
One other aspect of SPSS that may be useful in this data screening process is Sum-
marize Cases. This allows you to select and display specifi c pieces of information for
each case.
1. Click on Analyze, go to Reports and choose Case Summaries.
2. Choose the ID variable and other variables you are interested in (e.g. sex,
child, smoker). Remove the tick from the Limit cases to fi rst 100.
3. Click on the Statistics button and remove Number of cases from the Cell
Statistics box. Click on Continue.
Screening and cleaning the data 49
4. Click on the Options button and remove the tick from Subheadings for
totals.
5. Click on Continue and then on OK (or on Paste to save to Syntax Editor).
The syntax from this procedure is:
SUMMARIZE
/TABLES=id sex child smoke
/FORMAT=VALIDLIST NOCASENUM NOTOTAL
/TITLE=‘Case Summaries’
/MISSING=VARIABLE
/CELLS=NONE.
Part of the output is shown below.
In this chapter, we have checked for errors in only a few of the variables in the data
le to illustrate the process. For your own research, you would obviously check every
variable in the data fi le. If you would like some more practice fi nding errors, repeat
the procedures described above for all the variables in the error4ED.sav data fi le. I
have deliberately included a few errors to make the process more meaningful. Refer
to the codebook in the Appendix for survey4ED.sav to fi nd out what the legitimate
values for each variable should be.
For additional information on the screening and cleaning process, I would strongly
recommend you read Chapter 4 in Tabachnick and Fidell (2007).
This page intentionally left blank
PART THREE
Preliminary
analyses
Once you have a clean data fi le, you can begin the process of inspecting your data fi le and
exploring the nature of your variables. This is in readiness for conducting specifi c statis-
tical techniques to address your research questions. There are fi ve chapters that make
up Part Three of this book. In Chapter 6, the procedures required to obtain descriptive
statistics for both categorical and continuous variables are presented. This chapter also
covers checking the distribution of scores on continuous variables in terms of normality
and possible outliers. Graphs can be useful tools when getting to know your data. Some
of the more commonly used graphs available through SPSS are presented in Chapter 7.
Sometimes manipulation of the data fi le is needed to make it suitable for specifi c
analyses. This may involve calculating the total score on a scale, by adding up the scores
obtained on each of the individual items. It may also involve collapsing a continuous
variable into a smaller number of categories. These data manipulation techniques are
covered in Chapter 8. In Chapter 9, the procedure used to check the reliability (internal
consistency) of a scale is presented. This is particularly important in survey research,
or in studies that involve the use of scales to measure personality characteristics, atti-
tudes, beliefs etc. Also included in Part Three is a chapter that helps you through the
decision-making process in deciding which statistical technique is suitable to address
your research question. In Chapter 10, you are provided with an overview of some of
the statistical techniques available in SPSS and led step by step through the process
of deciding which one would suit your needs. Important aspects that you need to con-
sider (e.g. type of question, data type, characteristics of the variables) are highlighted.
51
This page intentionally left blank
53
6
Descriptive statistics
Once you are sure there are no errors in the data fi le (or at least no out-of-range values
on any of the variables), you can begin the descriptive phase of your data analysis.
Descriptive statistics have a number of uses. These include to:
describe the characteristics of your sample in the Method section of your report
check your variables for any violation of the assumptions underlying the statisti-
cal techniques that you will use to address your research questions
• address specifi c research questions.
The two procedures outlined in Chapter 5 for checking the data will also give you
information for describing your sample in the Method section of your report.
In studies involving human participants, it is useful to collect information on the
number of people or cases in the sample, the number and percentage of males and
females in the sample, the range and mean of ages, education level, and any other
relevant background information. Prior to doing many of the statistical analyses (e.g.
t-test, ANOVA, correlation), it is important to check that you are not violating any
of the assumptions’ made by the individual tests. (These are covered in detail in Part
Four and Part Five of this book.)
Testing of assumptions usually involves obtaining descriptive statistics on your
variables. These descriptive statistics include the mean, standard deviation, range
of scores, skewness and kurtosis. Descriptive statistics can be obtained a number of
different ways, providing a variety of information.
If all you want is a quick summary of the characteristics of the variables in your
data fi le, you can use a relatively new feature of SPSS (this may not be available if using
earlier versions of SPSS).
54 Preliminary analyses
Procedure for obtaining codebook
1. Click on Analyze, go to Reports and choose Codebook.
2. Select the variables you want and move them into the Codebook
Variables box.
3. Click on the Output tab and untick (by clicking on the box with a tick) all
the options except Label, Value Labels and Missing Values.
4. Click on the Statistics tab and make sure that all the options in both
sections are ticked.
5. Click on Continue, and then OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
CODEBOOK sex [n] age [s]
/VARINFO LABEL VALUELABELS MISSING
/OPTIONS VARORDER=VARLIST SORT=ASCENDING MAXCATS=200
/STATISTICS COUNT PERCENT MEAN STDDEV QUARTILES.
The output is shown below.
The output from the procedure shown above gives you a quick summary of the
cases in your data fi le. Often, however, you need more detailed information. This
Descriptive statistics 55
can be obtained using the Frequencies, Descriptives or Explore procedures. These
are all procedures listed under the Analyze, Descriptive Statistics drop-down menu.
There are, however, different procedures depending on whether you have a categori-
cal or continuous variable. Some of the statistics (e.g. mean, standard deviation) are
not appropriate if you have a categorical variable. The different approaches to be
used with categorical and continuous variables are presented in the following two
sections. If you would like to follow along with the examples in this chapter, open the
survey4ED.sav le.
CATEGORICAL VARIABLES
To obtain descriptive statistics for categorical variables, you should use Frequencies.
This will tell you how many people gave each response (e.g. how many males, how
many females). It doesn’t make any sense asking for means, standard deviations etc.
for categorical variables, such as sex or marital status.
Procedure for obtaining descriptive statistics for categorical variables
1. From the menu click on Analyze, then click on Descriptive Statistics, then
Frequencies.
2. Choose and highlight the categorical variables you are interested in
(e.g. sex). Move these into the Variables box.
3. Click on OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
FREQUENCIES
VARIABLES=sex
/ORDER= ANALYSIS .
The output is shown below.
56 Preliminary analyses
Interpretation of output from Frequencies
From the output shown above, we know that there are 185 males (42.1 per cent)
and 254 females (57.9 per cent) in the sample, giving a total of 439 respondents. It is
important to take note of the number of respondents you have in different subgroups
in your sample. For some analyses (e.g. ANOVA), it is easier to have roughly equal
group sizes. If you have very unequal group sizes, particularly if the group sizes are
small, it may be inappropriate to run some analyses.
CONTINUOUS VARIABLES
For continuous variables (e.g. age) it is easier to use Descriptives, which will provide
you with ‘summary’ statistics such as mean, median and standard deviation. You
certainly don’t want every single value listed, as this may involve hundreds of values
for some variables. You can collect the descriptive information on all your continu-
ous variables in one go; it is not necessary to do it variable by variable. Just transfer
all the variables you are interested in into the box labelled Variables. If you have a lot
of variables, however, your output will be extremely long. Sometimes it is easier to do
them in chunks and tick off each group of variables as you do them.
Procedure for obtaining descriptive statistics for continuous variables
1. From the menu click on Analyze, then select Descriptive Statistics, then
Descriptives.
2. Click on all the continuous variables that you wish to obtain descriptive
statistics for. Click on the arrow button to move them into the Variables
box (e.g. age, Total perceived stress: tpstress).
3. Click on the Options button. Make sure mean, standard deviation,
minimum, maximum are ticked and then click on skewness, kurtosis.
4. Click on Continue, and then OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
DESCRIPTIVES
VARIABLES=age tpstress
/STATISTICS=MEAN STDDEV MIN MAX KURTOSIS SKEWNESS .
The output generated from this procedure is shown below.
Descriptive statistics 57
Interpretation of output from Descriptives
In the output presented above, the information we requested for each of the vari-
ables is summarised. For example, for the variable age we have information from
439 respondents, ranging in age from 18 to 82 years, with a mean of 37.44 and
standard deviation of 13.20. This information may be needed for the Method section
of a report to describe the characteristics of the sample.
Descriptives also provides some information concerning the distribution of
scores on continuous variables (skewness and kurtosis). This information may
be needed if these variables are to be used in parametric statistical techniques
(e.g. t-tests, analysis of variance). The Skewness value provides an indication of
the symmetry of the distribution. Kurtosis, on the other hand, provides infor-
mation about the peakedness of the distribution. If the distribution is perfectly
normal, you would obtain a skewness and kurtosis value of 0 (rather an uncommon
occurrence in the social sciences).
Positive skewness values indicate positive skew (scores clustered to the left at the
low values). Negative skewness values indicate a clustering of scores at the high end
(right-hand side of a graph). Positive kurtosis values indicate that the distribution is
rather peaked (clustered in the centre), with long thin tails. Kurtosis values below 0
indicate a distribution that is relatively fl at (too many cases in the extremes). With
reasonably large samples, skewness will not make a substantive difference in the
analysis’ (Tabachnick & Fidell 2007, p. 80). Kurtosis can result in an underestimate of
the variance, but this risk is also reduced with a large sample (200+ cases: see Tabach-
nick & Fidell 2007, p. 80).
While there are tests that you can use to evaluate skewness and kurtosis values,
these are too sensitive with large samples. Tabachnick and Fidell (2007, p. 81) rec-
ommend inspecting the shape of the distribution (e.g. using a histogram). The proce-
dure for further assessing the normality of the distribution of scores is provided later
in this section.
58 Preliminary analyses
MISSING DATA
When you are doing research, particularly with human beings, it is rare that you will
obtain complete data from every case. It is important that you inspect your data fi le
for missing data. Run Descriptives and fi nd out what percentage of values is missing
for each of your variables. If you fi nd a variable with a lot of unexpected missing data,
you need to ask yourself why. You should also consider whether your missing values
are happening randomly, or whether there is some systematic pattern (e.g. lots of
women over 30 years of age failing to answer the question about their age!).
You also need to consider how you will deal with missing values when you come
to do your statistical analyses. The Options button in many of the SPSS statistical
procedures offers you choices for how you want to deal with missing data. It is impor-
tant that you choose carefully, as it can have dramatic effects on your results. This is
particularly important if you are including a list of variables and repeating the same
analysis for all variables (e.g. correlations among a group of variables, t-tests for a
series of dependent variables).
• The Exclude cases listwise option will include cases in the analysis only if they
have full data on all of the variables listed in your Variables box for that case. A
case will be totally excluded from all the analyses if it is missing even one piece of
information. This can severely, and unnecessarily, limit your sample size.
• The Exclude cases pairwise option, however, excludes the case (person) only
if they are missing the data required for the specifi c analysis. They will still be
included in any of the analyses for which they have the necessary information.
• The Replace with mean option, which is available in some SPSS statistical proce-
dures (e.g. multiple regression), calculates the mean value for the variable and
gives every missing case this value. This option should never be used, as it can
severely distort the results of your analysis, particularly if you have a lot of missing
values.
Always press the Options button for any statistical procedure you conduct, and check
which of these options is ticked (the default option varies across procedures). I would
suggest that you use pairwise exclusion of missing data, unless you have a pressing
reason to do otherwise. The only situation where you might need to use listwise ex-
clusion is when you want to refer only to a subset of cases that provided a full set of
results.
For more experienced users, there are more advanced and complex options avail-
able in SPSS for estimating missing values (e.g. imputation). These are included in
the Missing Value Analysis procedure. This can also be used to detect patterns within
missing data.
Descriptive statistics 59
ASSESSING NORMALITY
Many of the statistical techniques presented in Part Four and Part Five of this book
assume that the distribution of scores on the dependent variable is ‘normal’. Normal
is used to describe a symmetrical, bell-shaped curve, which has the greatest frequency
of scores in the middle with smaller frequencies towards the extremes (see Gravet-
ter & Wallnau 2004, p. 48). Normality can be assessed to some extent by obtaining
skewness and kurtosis values (as described in the previous section). However, other
techniques are also available in SPSS using the Explore option of the Descriptive
Statistics menu. This procedure is detailed below. In this example, I will assess the
normality of the distribution of scores for Total perceived stress for the sample as a
whole. You also have the option of doing this separately for different groups in your
sample by specifying an additional categorical variable (e.g. sex) in the Factor List
option that is available in the Explore dialogue box.
Procedure for assessing normality using Explore
1. From the menu at the top of the screen click on Analyze, then select
Descriptive Statistics, then Explore.
2. Click on the variable(s) you are interested in (e.g. Total perceived stress:
tpstress). Click on the arrow button to move them into the Dependent
List box.
3. In the Label Cases by: box, put your ID variable.
4. In the Display section, make sure that Both is selected.
5. Click on the Statistics button and click on Descriptives and Outliers. Click
on Continue.
6. Click on the Plots button. Under Descriptive, click on Histogram. Click on
Normality plots with tests. Click on Continue.
7. Click on the Options button. In the Missing Values section, click on
Exclude cases pairwise. Click on Continue and then OK (or on Paste to
save to Syntax Editor).
The syntax generated is:
EXAMINE
VARIABLES=tpstress
/ID= id
/PLOT BOXPLOT HISTOGRAM NPPLOT
/COMPARE GROUP
/STATISTICS DESCRIPTIVES EXTREME
/CINTERVAL 95
60 Preliminary analyses
/MISSING PAIRWISE
/NOTOTAL.
Selected output generated from this procedure is shown below.
Tests of normality
Descriptive statistics 61
62 Preliminary analyses
Descriptive statistics 63
Interpretation of output from Explore
Quite a lot of information is generated as part of this output. This tends to be a bit
overwhelming until you know what to look for. I will take you through the output
step by step.
In the table labelled Descriptives, you are provided with descriptive statistics and
other information concerning your variables. If you specifi ed a grouping variable
in the Factor List, this information will be provided separately for each group,
rather than for the sample as a whole. Some of this information you will recognise
(mean, median, std deviation, minimum, maximum etc.).
One statistic you may not know is the 5% Trimmed Mean. To obtain this
value, SPSS removes the top and bottom 5 per cent of your cases and calculates a
new mean value. If you compare the original mean (26.73) and this new trimmed
mean (26.64), you can see whether your extreme scores are having a strong infl u-
ence on the mean. If these two mean values are very different, you may need to
investigate these data points further. The ID values of the most extreme cases are
shown in the Extreme Values table.
Skewness and kurtosis values are also provided as part of this output, giving
information about the distribution of scores for the two groups (see discussion of
the meaning of these values in the previous section).
In the table labelled Tests of Normality, you are given the results of the
Kolmogorov-Smirnov statistic. This assesses the normality of the distribution of
scores. A non-signifi cant result (Sig. value of more than .05) indicates normal-
ity. In this case, the Sig. value is .000, suggesting violation of the assumption of
normality. This is quite common in larger samples.
The actual shape of the distribution for each group can be seen in the Histograms.
In this example, scores appear to be reasonably normally distributed. This is also
supported by an inspection of the normal probability plots (labelled Normal Q-Q
Plot). In this plot, the observed value for each score is plotted against the expected
value from the normal distribution. A reasonably straight line suggests a normal
distribution.
• The Detrended Normal Q-Q Plots are obtained by plotting the actual deviation
of the scores from the straight line. There should be no real clustering of points,
with most collecting around the zero line.
• The nal plot that is provided in the output is a boxplot of the distribution of
scores for the two groups. The rectangle represents 50 per cent of the cases, with
the whiskers (the lines protruding from the box) going out to the smallest and
largest values. Sometimes you will see additional circles outside this range—these
are classifi ed by SPSS as outliers. The line inside the rectangle is the median value.
Boxplots are discussed further in the next section on detecting outliers.
64 Preliminary analyses
In the example given above, the distribution of scores was reasonably normal’. Often this
is not the case. Many scales and measures used in the social sciences have scores that are
skewed, either positively or negatively. This does not necessarily indicate a problem with
the scale, but rather refl ects the underlying nature of the construct being measured. Life
satisfaction measures, for example, are often negatively skewed, with most people being
reasonably happy with their lot in life. Clinical measures of anxiety or depression are often
positively skewed in the general population, with most people recording relatively few
symptoms of these disorders. Some authors in this area recommend that, with skewed
data, the scores be ‘transformed’ statistically. This issue is discussed further in Chapter 8.
CHECKING FOR OUTLIERS
Many of the statistical techniques covered in this book are sensitive to outliers (cases
with values well above or well below the majority of other cases). The techniques
described in the previous section can also be used to check for outliers.
First, have a look at the Histogram. Look at the tails of the distribution. Are there
data points sitting on their own, out on the extremes? If so, these are potential
outliers. If the scores drop away in a reasonably even slope, there is probably not
too much to worry about.
Second, inspect the Boxplot. Any scores that SPSS considers are outliers appear
as little circles with a number attached (this is the ID number of the case). SPSS
defi nes points as outliers if they extend more than 1.5 box-lengths from the edge
of the box. Extreme points (indicated with an asterisk, *) are those that extend
more than three box-lengths from the edge of the box. In the example above there
are no extreme points, but there are two outliers: ID numbers 24 and 157. If you
nd points like this, you need to decide what to do with them.
It is important to check that the outlier’s score is genuine, not just an error. Check
the score and see whether it is within the range of possible scores for that variable.
Sometimes it is worth checking back with the questionnaire or data record to see
if there was a mistake in entering the data. If it is an error, correct it, and repeat
the boxplot. If it turns out to be a genuine score, you then need to decide what
you will do about it. Some statistics writers suggest removing all extreme outliers
from the data fi le. Others suggest changing the value to a less extreme value, thus
including the person in the analysis but not allowing the score to distort the
statistics (for more advice on this, see Chapter 4 in Tabachnick & Fidell 2007).
The information in the Descriptives table can give you an indication of how
much of a problem these outlying cases are likely to be. The value you are inter-
ested in is the 5% Trimmed Mean. If the trimmed mean and mean values are very
different, you may need to investigate these data points further. In this example,
the two mean values (26.73 and 26.64) are very similar. Given this, and the fact
Descriptive statistics 65
that the values are not too different from the remaining distribution, I will retain
these cases in the data fi le.
If you wish to change or remove values in your fi le, go to the Data Editor window,
sort the data fi le in descending order (to fi nd the people with the highest values)
or ascending if you are concerned about cases with very low values. The cases you
need to look at in more detail are then at the top of the data fi le. Move across to
the column representing that variable and modify or delete the value of concern.
Always record changes to your data fi le in a log book.
ADDITIONAL EXERCISES
Business
Data fi le: staffsurveysav. See Appendix for details of the data fi le.
1. Follow the procedures covered in this chapter to generate appropriate descriptive
statistics to answer the following questions.
(a) What percentage of the staff in this organisation are permanent employees?
(Use the variable employstatus.)
(b) What is the average length of service for staff in the organisation? (Use the
variable service.)
(c) What percentage of respondents would recommend the organisation to others
as a good place to work? (Use the variable recommend.)
2. Assess the distribution of scores on the Total Staff Satisfaction Scale (totsatis) for
employees who are permanent versus casual (employstatus).
(a) Are there any outliers on this scale that you would be concerned about?
(b) Are scores normally distributed for each group?
Health
Data fi le: sleep4ED.sav. See Appendix for details of the data fi le.
1. Follow the procedures covered in this chapter to generate appropriate descriptive
statistics to answer the following questions.
(a) What percentage of respondents are female (gender)?
(b) What is the average age of the sample?
(c) What percentage of the sample indicated that they had a problem with their
sleep (probsleeprec)?
(d) What is the median number of hours sleep per weeknight (hourweeknight)?
2. Assess the distribution of scores on the Sleepiness and Associated Sensations Scale
(totSAS) for people who feel that they do/don’t have a sleep problem (probsleeprec).
(a) Are there any outliers on this scale that you would be concerned about?
(b) Are scores normally distributed for each group?
66
While the numerical values obtained in Chapter 6 provide useful information
concerning your sample and your variables, some aspects are better explored visually.
SPSS provides a number of different types of graphs (also referred to as charts). In this
chapter, I’ll cover the basic procedures to obtain the following graphs:
• histograms
• bar graphs
• line graphs
• scatterplots
• boxplots.
In SPSS there are a number of different ways of generating graphs, using the Graph
menu option. These include Chart Builder, Graphboard Template Chooser, and
Legacy Dialogs.
In this chapter I will demonstrate the Legacy Dialogs approach, which I fi nd the
easiest way to generate graphs. Spend some time playing with each of the different
graphs and exploring their possibilities. In this chapter only a brief overview is given
to get you started. To illustrate the various graphs I have used the survey4ED.sav
data fi le, which is included on the website accompanying this book (see p. viii and the
Appendix for details). If you wish to follow along with the procedures described in
this chapter, you will need to start SPSS and open the fi le labelled survey4ED.sav.
At the end of this chapter, instructions are also given on how to edit a graph
to better suit your needs. This may be useful if you intend to use the graph in your
research paper. The procedure for importing graphs directly into Microsoft Word is
also detailed.
7
Using graphs to describe
and explore the data
Using graphs to describe and explore the data 67
HISTOGRAMS
Histograms are used to display the distribution of a single continuous variable (e.g.
age, perceived stress scores).
Procedure for creating a histogram
1. From the menu click on Graphs, then select Legacy Dialogs. Choose
Histogram.
2. Click on your variable of interest and move it into the Variable box. This
should be a continuous variable (e.g. Total perceived stress: tpstress).
3. If you would like to generate separate histograms for different groups
(e.g. male/female), you could put an additional variable (e.g. sex) in the
Panel by: section. Choose Rows if you would like the two graphs on top
of one another, or Column if you want them side by side. In this example,
I will put the sex variable in the Column box.
4. Click on OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
GRAPH
/HISTOGRAM=tpstress
/PANEL COLVAR=sex COLOP=CROSS .
The output generated from this procedure is shown below.
68 Preliminary analyses
Interpretation of output from Histogram
Inspection of the shape of the histogram provides information about the distribution
of scores on the continuous variable. Many of the statistics discussed in this manual
assume that the scores on each of the variables are normally distributed (i.e. follow the
shape of the normal curve). In this example the scores are reasonably normally distrib-
uted, with most scores occurring in the centre, tapering out towards the extremes. It is
quite common in the social sciences, however, to fi nd that variables are not normally
distributed. Scores may be skewed to the left or right or, alternatively, arranged in a
rectangular shape. For further discussion of the assessment of the normality of vari-
ables see Chapter 6.
Using graphs to describe and explore the data 69
BAR GRAPHS
Bar graphs can be simple or very complex, depending on how many variables you
wish to include. The bar graph can show the number of cases in particular categories,
or it can show the score on some continuous variable for different categories. Basi-
cally, you need two main variables—one categorical and one continuous. You can also
break this down further with another categorical variable if you wish.
Procedure for creating a bar graph
1. From the menu at the top of the screen, click on Graphs, then select
Legacy Dialogs. Choose Bar. Click on Clustered.
2. In the Data in chart are section, click on Summaries for groups of cases.
Click on Defi ne.
3. In the Bars represent box, click on Other statistic (e.g. mean).
4. Click on the continuous variable you are interested in (e.g. Total perceived
stress: tpstress). This should appear in the box listed as Mean (Total
perceived stress). This indicates that the mean on the Perceived Stress
Scale for the different groups will be displayed.
5. Click on your fi rst categorical variable (e.g. agegp3). Click on the arrow
button to move it into the Category axis box. This variable will appear
across the bottom of your bar graph (X axis).
6. Click on another categorical variable (e.g. sex) and move it into the
Defi ne Clusters by: box. This variable will be represented in the legend.
7. If you would like to display error bars on your graph, click on the Options
button and click on Display error bars. Choose what you want the bars to
represent (e.g. confi dence intervals).
8. Click on Continue and then OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
GRAPH
/BAR(GROUPED)=MEAN(tpstress) BY agegp3 BY sex.
/INTERVAL CI(95.0).
70 Preliminary analyses
The output generated from this procedure is shown below.
Interpretation of output from Bar Graph
The output from this procedure gives you a quick summary of the distribution of
scores for the groups that you have requested (in this case, males and females from
the different age groups). The graph presented above suggests that females had higher
perceived stress scores than males, and that this difference is more pronounced among
the two older age groups. Among the 18 to 29 age group, the difference in scores
between males and females is very small.
Care should be taken when interpreting the output from Bar Graph. You should
always look at the scale used on the Y (vertical) axis. Sometimes what looks like a
dramatic difference is really only a few scale points and, therefore, probably of little
importance. This is clearly evident in the bar graph displayed above. You will see that
the difference between the groups is quite small when you consider the scale used to
display the graph. The difference between the smallest score (males aged 45 or more)
and the highest score (females aged 18 to 29) is only about three points.
To assess the signifi cance of any difference you might fi nd between groups, it is
necessary to conduct further statistical analyses. In this case, a two-way, between-
groups analysis of variance (see Chapter 19) would be conducted to fi nd out if the
differences are statistically signifi cant.
Using graphs to describe and explore the data 71
LINE GRAPHS
A line graph allows you to inspect the mean scores of a continuous variable across a
number of different values of a categorical variable (e.g. time 1, time 2, time 3). They are
also useful for graphically exploring the results of a one- or two-way analysis of variance.
Line graphs are provided as an optional extra in the output of analysis of variance (see
Chapters 18 and 19). The following procedure shows you how to generate a line graph
using the same variables as in the previous procedure for bar graphs.
Procedure for creating a line graph
1. From the menu at the top of the screen, select Graphs, then Legacy
Dialogs, then Line.
2. Click on Multiple. In the Data in Chart Are section, click on Summaries for
groups of cases. Click on Defi ne.
3. In the Lines represent box, click on Other statistic. Click on the
continuous variable you are interested in (e.g. Total perceived stress:
tpstress). Click on the arrow button. The variable should appear in
the box listed as Mean (Total perceived stress). This indicates that the
mean on the Perceived Stress Scale for the different groups will be
displayed.
4. Click on your fi rst categorical variable (e.g. agegp3). Click on the arrow
button to move it into the Category Axis box. This variable will appear
across the bottom of your line graph (X axis).
5. Click on another categorical variable (e.g. sex) and move it into the
Defi ne Lines by: box. This variable will be represented in the legend.
6. If you would like to add error bars to your graph, you can click on the
Options button. Click on the Display error bars box and choose what you
would like the error bars to represent (e.g. confi dence intervals).
7. Click on OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
GRAPH
/LINE(MULTIPLE)MEAN(tpstress) BY agegp3 BY sex.
72 Preliminary analyses
The output generated from this procedure is shown below.
18-29 30-44 45+
age 3 groups
25
26
27
28
Mean total perceived stress
sex
MALES
FEMALES
Interpretation of output from Line Chart
First, you can look at the impact of age on perceived stress for each of the sexes
separately. Younger males appear to have higher levels of perceived stress than
either middle-aged or older males. For females, the difference across the age
groups is not quite so pronounced. The older females are only slightly less stressed
than the younger group.
You can also consider the difference between males and females. Overall, males
appear to have lower levels of perceived stress than females. Although the differ-
ence for the younger group is only small, there appears to be a discrepancy for
the older age groups. Whether or not these differences reach statistical signifi -
cance can be determined only by performing a two-way analysis of variance (see
Chapter 19).
Using graphs to describe and explore the data 73
The results presented above suggest that to understand the impact of age on
perceived stress you must consider the respondents gender. This sort of relationship
is referred to, when doing analysis of variance, as an interaction effect. While the use
of a line graph does not tell you whether this relationship is statistically signifi cant, it
certainly gives you a lot of information and raises a lot of additional questions.
Sometimes in interpreting the output it is useful to consider other questions. In
this case, the results suggest that it may be worthwhile to explore in more depth the
relationship between age and perceived stress for the two groups (males and females).
To do this I decided to split the sample, not just into three groups for age, as in the
above graph, but into fi ve groups to get more detailed information concerning
the infl uence of age.
After dividing the group into fi ve equal groups (by creating a new variable,
age5gp—instructions for this process are presented in Chapter 8), a new line graph
was generated. This gives us a clearer picture of the infl uence of age than the previous
line graph using only three age groups.
18-24 25-32 33-40 41-49 50+
age 5 groups
24
25
26
27
28
29
Mean total perceived stress
sex
MALES
FEMALES
74 Preliminary analyses
SCATTERPLOTS
Scatterplots are typically used to explore the relationship between two continuous
variables (e.g. age and self-esteem). It is a good idea to generate a scatterplot before
calculating correlations (see Chapter 11). The scatterplot will give you an indication
of whether your variables are related in a linear (straight-line) or curvilinear fashion.
Only linear relationships are suitable for correlation analyses.
The scatterplot will also indicate whether your variables are positively related
(high scores on one variable are associated with high scores on the other) or nega-
tively related (high scores on one are associated with low scores on the other). For
positive correlations, the points form a line pointing upwards to the right (that is,
they start low on the left-hand side and move higher on the right). For negative corre-
lations, the line starts high on the left and moves down on the right (see an example
of this in the output below).
The scatterplot also provides a general indication of the strength of the relationship
between your two variables. If the relationship is weak the points will be all over the
place, in a blob-type arrangement. For a strong relationship the points will form a vague
cigar shape, with a defi nite clumping of scores around an imaginary straight line.
In the example that follows, I request a scatterplot of scores on two of the scales
in the survey: the Total perceived stress and the Total Perceived Control of Internal
States Scale (PCOISS). I have asked for two groups in my sample (males and females)
to be represented separately on the one scatterplot (using different symbols). This not
only provides me with information concerning my sample as a whole but also gives
additional information on the distribution of scores for males and females.
If you would prefer to have separate scatterplots for each group, you can specify a
categorical variable in the Panel by: section instead of the Set Markers by: box shown
below. If you wish to obtain a scatterplot for the full sample (not split by group), just
ignore the instructions below in the section labelled Set Markers by:
Procedure for creating a scatterplot
1. From the menu at the top of the screen, click on Graphs, then Legacy
Dialogs and then on Scatter/Dot.
2. Click on Simple Scatter and then Defi ne.
3. Click on your fi rst variable, usually the one you consider is the dependent
variable (e.g. Total perceived stress: tpstress).
4. Click on the arrow to move it into the box labelled Y axis. This variable
will appear on the vertical axis.
5. Move your other variable (e.g. Total PCOISS: tpcoiss) into the box labelled
X axis. This variable will appear on the horizontal axis.
6. You can also have SPSS mark each of the points according to some other
Using graphs to describe and explore the data 75
categorical variable (e.g. sex). Move this variable into the Set Markers by:
box. This will display males and females using different markers.
7. Move the ID variable in the Label Cases by: box. This will allow you to
nd out the ID number of a case from the graph if you fi nd an outlier.
8. Click on OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
GRAPH
/SCATTERPLOT(BIVAR)=tpcoiss WITH tpstress BY sex BY id (IDENTIFY)
/MISSING=LISTWISE .
The output generated from this procedure, modifi ed slightly for display
purposes, is shown below.
20 30 40 50 60 70 80 90
total PCOISS
10
20
30
40
50
total perceived stress
sex
MALES
FEMALES
76 Preliminary analyses
Interpretation of output from Scatterplot
From the output on the previous page, there appears to be a moderate, negative correla-
tion between the two variables (Perceived Stress and PCOISS) for the sample as a whole.
Respondents with high levels of perceived control (shown on the X, or horizontal, axis)
experience lower levels of perceived stress (shown on the Y, or vertical, axis). On the
other hand, people with low levels of perceived control have much greater perceived
stress.
Remember, the scatterplot does not give you defi nitive answers; you need to follow
it up with the calculation of the appropriate statistic. There is no indication of a curvi-
linear relationship, so it would be appropriate to calculate a Pearson product-moment
correlation for these two variables (see Chapter 11) if the distributions are roughly
normal (check the histograms for these two variables).
In the example above, I have looked at the relationship between only two vari-
ables. It is also possible to generate a matrix of scatterplots between a whole group
of variables. This is useful as preliminary assumption testing for analyses such as
MANOVA.
Procedure to generate a matrix of scatterplots
1. From the menu at the top of the screen, click on Graphs, then Legacy
Dialogs and then on Scatter/Dot.
2. Click on Matrix Scatter. Click on the Defi ne button.
3. Select all of your continuous variables (tnegaff, tposaff, tpstress) and
move them into the Matrix Variables box.
4. Select the sex variable and move it into the Rows box.
5. Click on the Options button and select Exclude cases variable by variable.
6. Click on Continue and then OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
GRAPH
/SCATTERPLOT(MATRIX)=tposaff tnegaff tpstress
/PANEL ROWVAR=sex ROWOP=CROSS
/MISSING=VARIABLEWISE .
Using graphs to describe and explore the data 77
The output generated from this procedure is shown below.
BOXPLOTS
Boxplots are useful when you wish to compare the distribution of scores on variables.
You can use them to explore the distribution of one continuous variable (e.g. positive
affect) or, alternatively, you can ask for scores to be broken down for different groups
(e.g. age groups). You can also add an extra categorical variable to compare (e.g. males
and females). In the example below, I will explore the distribution of scores on the
Positive Affect scale for males and females.
78 Preliminary analyses
Procedure for creating a boxplot
1. From the menu at the top of the screen, click on Graphs, then select
Legacy Dialogs and then Boxplot.
2. Click on Simple. In the Data in Chart Are section, click on Summaries for
groups of cases. Click on the Defi ne button.
3. Click on your continuous variable (e.g. Total Positive Affect: tposaff). Click
on the arrow button to move it into the Variable box.
4. Click on your categorical variable (e.g. sex). Click on the arrow button to
move it into the Category axis box.
5. Click on ID and move it into the Label cases box. This will allow you to
identify the ID numbers of any cases with extreme values.
6. Click on OK (or on Paste to save to Syntax Editor).
The syntax generated from this procedure is:
EXAMINE
VARIABLES=tposaff BY sex
/PLOT=BOXPLOT/STATISTICS=NONE/NOTOTAL/ID=id.
The output generated from this procedure is shown as follows.
Using graphs to describe and explore the data 79
Interpretation of output from Boxplot
The output from Boxplot gives you a lot of information about the distribution of
your continuous variable and the possible infl uence of your other categorical variable
(and cluster variable if used).
Each distribution of scores is represented by a box and protruding lines (called
whiskers). The length of the box is the variables interquartile range and contains
50 per cent of cases. The line across the inside of the box represents the median
value. The whiskers protruding from the box go out to the variable’s smallest and
largest values.
Any scores that SPSS considers are outliers appear as little circles with a number
attached (this is the ID number of the case). Outliers are cases with scores that are
quite different from the remainder of the sample, either much higher or much
lower. SPSS defi nes points as outliers if they extend more than 1.5 box-lengths
from the edge of the box. Extreme points (indicated with an asterisk, *) are those
that extend more than three box-lengths from the edge of the box. For more
information on outliers, see Chapter 6. In the example above, there are a number
of outliers at the low values for Positive Affect for both males and females.
In addition to providing information on outliers, a boxplot allows you to inspect
the pattern of scores for your various groups. It provides an indication of the
variability in scores within each group and allows a visual inspection of the differ-
ences between groups. In the example presented above, the distribution of scores
on Positive Affect for males and females is very similar.
EDITING A CHART OR GRAPH
Sometimes modifi cations need to be made to the titles, labels, markers etc. of a graph
before you can print it or use it in your report. For example, I have edited some of the
graphs displayed in this chapter to make them clearer (e.g. changing the patterns in the
bar graph, thickening the lines used in the line graph). To edit a chart or graph, you
need to open the Chart Editor window. To do this, place your cursor on the graph that
you wish to modify. Double-click and a new window will appear showing your graph,
complete with additional menu options and icons (see Figure 7.1).
You should see a smaller Properties window pop up, which allows you to make
changes to your graphs. If this does not appear, click on the Edit menu and select
Properties.
There are a number of changes you can make while in Chart Editor:
To change the words used in labels, click once on the label to highlight it (a gold-
coloured box should appear around the text). Click once again to edit the text (a
80 Preliminary analyses
red cursor should appear). Modify the text and then press Enter on your keyboard
when you have fi nished.
To change the position of the X and Y axis labels (e.g. to centre them), double-click
on the title you wish to change. In the Properties box, click on the Text Layout
tab. In the section labelled Justify, choose the position you want (the dot means
centred, the left arrow moves it to the left, and the right arrow moves it to the
right).
To change the characteristics of the text, lines, markers, colours, patterns and scale
used in the chart, click once on the aspect of the graph that you wish to change.
The Properties window will adjust its options depending on the aspect you click
on. The various tabs in this box will allow you to change aspects of the graph. If
you want to change one of the lines of a multiple-line graph (or markers for a
group), you will need to highlight the specifi c category in the legend (rather than
on the graph itself). This is useful for changing one of the lines to dashes so that
it is more clearly distinguishable when printed out in black and white.
The best way to learn how to use these options is to experiment—so go ahead and
play!
IMPORTING CHARTS AND GRAPHS INTO WORD
DOCUMENTS
SPSS allows you to copy charts directly into your word processor (e.g. Microsoft
Word). This is useful when you are preparing the fi nal version of your report and
want to present some of your results in the form of a graph. Sometimes a graph will
present your results more simply and clearly than numbers in a box. Don’t go over-
board—use only for special effect. Make sure you modify the graph in SPSS to make
it as clear as possible before transferring it to Word.
Figure 7.1
Example of a Chart
Editor menu bar
Using graphs to describe and explore the data 81
Procedure for importing a chart into a Word document
Windows allows you to have more than one program open at a time. To
transfer between SPSS and Word, you will need to have both of these
programs open. It is possible to swap backwards and forwards between the
two just by clicking on the appropriate icon in the taskbar at the bottom of
your screen, or from the Window menu. This is like shuffl ing pieces of paper
around on your desk.
1. Start Word and open the fi le in which you would like the graph to appear.
Click on the SPSS icon on the bottom of your screen to return to SPSS.
2. In SPSS make sure you have the Viewer window on the screen in front of
you.
3. Click once on the graph that you would like to copy. A border should
appear around the graph.
4. Click on Edit (from the menu at the top of the page) and then choose
Copy. This saves the chart to the clipboard (you won’t be able to see it,
however).
5. From the list of minimised programs at the bottom of your screen, click
on your Word document.
6. In the Word document, place your cursor where you wish to insert the
chart.
7. Click on Edit from the Word menu and choose Paste. Or just click on the
Paste icon on the top menu bar (it looks like a clipboard).
8. Click on File and then Save to save your Word document.
9. To move back to SPSS to continue with your analyses just click on the
SPSS icon, which should be listed at the bottom of your screen. With both
programs open you can just jump backwards and forwards between the
two programs, copying charts, tables etc. There is no need to close either
of the programs until you have fi nished completely. Just remember to
save as you go along.
ADDITIONAL EXERCISES
Business
Data fi le: staffsurvey4ED.sav. See Appendix for details of the data fi le.
1. Generate a histogram to explore the distribution of scores on the Staff Satisfac-
tion Scale (totsatis).
82 Preliminary analyses
2. Generate a bar graph to assess the staff satisfaction levels for permanent versus
casual staff employed for less than or equal to 2 years, 3 to 5 years and 6 or more
years. The variables you will need are totsatis, employstatus and servicegp3.
3. Generate a scatterplot to explore the relationship between years of service and
staff satisfaction. Try fi rst using the service variable (which is very skewed)
and then try again with the variable towards the bottom of the list of variables
(logservice). This new variable is a mathematical transformation (log 10) of the
original variable (service), designed to adjust for the severe skewness. This pro-
cedure is covered in Chapter 8.
4. Generate a boxplot to explore the distribution of scores on the Staff Satisfaction
Scale (totsatis) for the different age groups (age).
5. Generate a line graph to compare staff satisfaction for the different age groups
(use the agerecode variable) for permanent and casual staff.
Health
Data fi le: sleep4ED.sav. See Appendix for details of the data fi le.
1. Generate a histogram to explore the distribution of scores on the Epworth
Sleepiness Scale (ess).
2. Generate a bar graph to compare scores on the Sleepiness and Associated Sen-
sations Scale (totSAS) across three age groups (agegp3) for males and females
(gender).
3. Generate a scatterplot to explore the relationship between scores on the Epworth
Sleepiness Scale (ess) and the Sleepiness and Associated Sensations Scale (totSAS).
Ask for different markers for males and females (gender).
4. Generate a boxplot to explore the distribution of scores on the Sleepiness and
Associated Sensations Scale (totSAS) for people who report that they do/don’t
have a problem with their sleep (probsleeprec).
5. Generate a line graph to compare scores on the Sleepiness and Associated Sen-
sations Scale (totSAS) across the different age groups (use the agegp3 variable) for
males and females (gender).
83
8
Manipulating the data
Once you have entered the data and the data fi le has been checked for accuracy, the
next step involves manipulating the raw data into a form that you can use to conduct
analyses and to test your hypotheses. Depending on the data fi le, your variables of
interest and the type of research questions that you wish to address, this process may
include:
adding up the scores from the items that make up each scale to give an overall
score for scales such as self-esteem, optimism, perceived stress etc. SPSS does this
quickly, easily and accurately—don’t even think about doing this by hand for each
separate case
transforming skewed variables for analyses that require normally distributed
scores
collapsing continuous variables (e.g. age) into categorical variables (e.g. young,
middle-aged and old) to do some analyses such as analysis of variance
reducing or collapsing the number of categories of a categorical variable (e.g.
collapsing the marital status into just two categories representing people ‘in a
relationship’/‘not in a relationship’).
When you make the changes to the variables in your data fi le, it is important that
you note this in your codebook. The other way that you can keep a record of all the
changes made to your data fi le is to use the SPSS Syntax option that is available in all
SPSS procedures. I will describe this process fi rst before demonstrating how to recode
and transform your variables.
Using Syntax to record procedures
As discussed previously in Chapter 3, SPSS has a Syntax Editor window that can
be used to record the commands generated using the Windows menus for each
84 Preliminary analyses
procedure. To access the syntax, follow the instructions shown in the procedure sections
to follow but stop before clicking the fi nal OK button. Instead, click on the Paste button.
This will open a new window, the Syntax Editor, showing the commands you have
selected. Figure 8.1 shows part of the Syntax Editor window that was used to recode
items and compute the total scores used in survey4ED.sav. The complete syntax fi le
(surveysyntax.sps) can be downloaded from the SPSS Survival Manual website.
The commands pasted to the Syntax Editor are not executed until you choose to
run them. To run the command, highlight the specifi c command (making sure you
include the fi nal full stop) and then click on the Run menu option or the arrow icon
from the menu bar. Alternatively, you can select the name of the analysis you wish to
run from the left-hand side of the screen.
Extra comments can be added to the syntax fi le by starting them with an asterisk
(*). If you add comments, make sure you leave at least one line of space both before
and after syntax commands.
For each of the procedures described in the following sections, the syntax will also
be shown.
Figure 8.1
Example of a Syntax
Editor window
CALCULATING TOTAL SCALE SCORES
Before you can perform statistical analyses on your data set, you need to calculate total
scale scores for any scales used in your study. This involves two steps:
• Step 1: reverse any negatively worded items.
• Step 2: add together scores from all the items that make up the subscale or scale.
Manipulating the data 85
It is important that you understand the scales and measures that you are using for
your research. You should check with the scale’s manual or the journal article it was
published in to fi nd out which items, if any, need to be reversed and how to go about
calculating a total score. Some scales consist of a number of subscales that either can,
or alternatively should not, be added together to give an overall score. It is important
that you do this correctly, and it is much easier to do it right the fi rst time than to have
to repeat analyses later.
Important: you should do this only when you have a complete data fi le as SPSS
does not update these commands when you add extra data.
Step 1: Reversing negatively worded items
In some scales the wording of particular items has been reversed to help prevent
response bias. This is evident in the Optimism Scale used in the survey (see Appendix).
Item 1 is worded in a positive direction (high scores indicate high optimism): ‘In
uncertain times I usually expect the best. Item 2, however, is negatively worded (high
scores indicate low optimism): ‘If something can go wrong for me it will. Items 4 and
6 are also negatively worded. The negatively worded items need to be reversed before
a total score can be calculated for this scale. We need to ensure that all items are scored
so that high scores indicate high levels of optimism.
The procedure for reversing items 2, 4 and 6 of the Optimism Scale is shown in
the table that follows. A fi ve-point Likert-type scale was used for the Optimism Scale;
therefore, scores for each item can range from 1 (strongly disagree) to 5 (strongly
agree).
Although it is possible to rescore variables into the same variable name, we will
ask SPSS to create new variables rather than overwrite the existing data. This is a
much safer option, and it retains our original data unchanged.
If you wish to follow along with the instructions shown below, you should open
survey4ED.sav.
1. From the menu at the top of the screen, click on Transform, then click on
Recode Into Different Variables.
2. Select the items you want to reverse (op2, op4, op6). Move these into the
Input Variable—Output Variable box.
3. Click on the fi rst variable (op2) and type a new name in the Output
Variable section on the right-hand side of the screen and then click the
Change button. I have used Rop2 in the existing data fi le. If you wish to
create your own (rather than overwrite the ones already in the data fi le),
use another name (e.g. revop2). Repeat for each of the other variables
you wish to reverse (op4 and op6).
4. Click on the Old and new values button.
86 Preliminary analyses
In the Old value section, type 1 in the Value box.
In the New value section, type 5 in the Value box (this will change all
scores that were originally scored as 1 to a 5).
5. Click on Add. This will place the instruction (1 5) in the box labelled Old
> New.
6. Repeat the same procedure for the remaining scores. For example:
Old value—type in 2 New value—type in 4 Add
Old value—type in 3 New value—type in 3 Add
Old value—type in 4 New value—type in 2 Add
Old value—type in 5 New value—type in 1 Add
Always double-check the item numbers that you specify for recoding and
the old and new values that you enter. Not all scales use a fi ve-point scale;
some have four possible responses, some six and some seven. Check that
you have reversed all the possible values for your particular scale.
7. Click on Continue and then OK (or on Paste if you wish to paste this
command to the Syntax Editor window). To execute it after pasting to the
Syntax Editor, highlight the command and select Run from the menu.
The syntax generated for this command is:
RECODE
op2 op4 op6
(1=5) (2=4) (3=3) (4=2) (5=1) INTO Rop2 Rop4 Rop6 .
EXECUTE .
The new variables with reversed scores should be found at the end of the data fi le.
Check this in your Data Editor window, choose the Variable View tab and go down to
the bottom of the list of variables. In the survey4ED.sav le you will see a whole series
of variables with an R at the front of the variable name. These are the items that I have
reversed. If you follow the instructions shown above, you should see yours at the very
bottom with ‘rev’ at the start of each. It is important to check your recoded variables
to see what effect the recode had on the values. For the fi rst few cases in your data set,
take note of the scores on the original variables and then check the corresponding
reversed variables to ensure that it worked properly.
Step 2: Adding up the total scores for the scale
After you have reversed the negatively worded items in the scale, you will be ready to
calculate total scores for each subject.
Important: you should do this only when you have a complete data fi le as SPSS
does not update this command when you add extra data.
Manipulating the data 87
Procedure for calculating total scale scores
1. From the menu at the top of the screen, click on Transform, then click on
Compute Variable.
2. In the Target Variable box, type in the new name you wish to give to the
total scale scores. (It is useful to use a T prefi x to indicate total scores, as
this makes them easier to fi nd in the list of variables when you are doing
your analyses.)
Important: make sure you do not accidentally use a variable name that
has already been used in the data set. If you do, you will lose all the
original data—potential disaster—so check your codebook.
3. Click on the Type and Label button. Click in the Label box and type in a
description of the scale (e.g. total optimism). Click on Continue.
4. From the list of variables on the left-hand side, click on the fi rst item in
the scale (op1).
5. Click on the arrow button to move it into the Numeric Expression box.
6. Click on + on the calculator.
7. Repeat the process until all scale items appear in the box. In this example
we would select the unreversed items fi rst (op3, op5) and then the
reversed items (obtained in the previous procedure), which are located at
the bottom of the list of variables (Rop2, Rop4, Rop6).
8. The complete numeric expression should read as follows:
op1+op3+op5+Rop2+Rop4+Rop6.
9. Double-check that all items are correct and that there are + signs in the
right places. Click OK (or on Paste if you wish to paste this command
to the Syntax Editor window). To execute it after pasting to the Syntax
Editor, highlight the command and select Run from the menu.
The syntax for this command is:
COMPUTE toptim = op1+op3+op5+Rop2+Rop4+Rop6 .
EXECUTE .
This will create a new variable at the end of your data set called TOPTIM. Scores
for each person will consist of the addition of scores on each of the items op1 to op6
(with recoded items where necessary). If any items had missing data, the overall score
will also be missing. This is indicated by a full stop instead of a score in the data fi le.
You will notice in the literature that some researchers go a step further and divide the
total scale score by the number of items in the scale. This can make it a little easier to
88 Preliminary analyses
interpret the scores of the total scale because it is back in the original scale used for each
of the items (e.g. from 1 to 5 representing strongly disagree to strongly agree). To do this,
you also use the Transform, Compute menu of SPSS. This time you will need to specify
a new variable name and then type in a suitable formula (e.g. TOPTIM/6).
Always record details of any new variables that you create in your codebook.
Specify the new variable’s name, what it represents and full details of what was done
to calculate it. If any items were reversed, this should be specifi ed along with details
of which items were added to create the score. It is also a good idea to include the
possible range of scores for the new variable in the codebook (see the Appendix). This
gives you a clear guide when checking for any out-of-range values.
After creating a new variable, it is important to run Descriptives on this new scale
to check that the values are appropriate (see Chapter 5). It also helps you get a feel for
the distribution of scores on your new variable.
Check back with the questionnaire—what is the possible range of scores that could
be recorded? For a ten-item scale, using a response scale from 1 to 4, the minimum
value would be 10 and the maximum value would be 40. If a person answered 1 to
every item, that overall score would be 10 × 1 = 10. If a person answered 4 to each
item, that score would be 10 × 4 = 40.
Check the output from Descriptives to ensure that there are no out-of-range
cases (see Chapter 5).
Compare the mean score on the scale with values reported in the literature. Is
your value similar to that obtained in previous studies? If not, why not? Have you
done something wrong in the recoding? Or is your sample different from that
used in other studies?
You should also run other analyses to check the distribution of scores on your new
total scale variable:
Check the distribution of scores using skewness and kurtosis (see Chapter 6).
Obtain a histogram of the scores and inspect the spread of scores. Are they
normally distributed? If not, you may need to consider ‘transforming’ the scores
for some analyses (this is discussed later in this chapter).
COLLAPSING A CONTINUOUS VARIABLE INTO GROUPS
For some analyses or when you have very skewed distributions, you may wish to divide
the sample into equal groups according to respondents scores on some variable (e.g.
to give low, medium and high scoring groups).
To illustrate this process, I will use the survey4ED.sav le that is included on the
Manipulating the data 89
website that accompanies this book (see p. viii and the Appendix for details). I will use
Visual Binning to identify suitable cut-off points to break the continuous variable age
into three approximately equal groups. The same technique could be used to create
a median split’; that is, to divide the sample into two groups, using the median as
the cut-off point. Once the cut-off points are identifi ed, Visual Binning will create
a new categorical variable that has only three values corresponding to the three age
ranges chosen. This technique leaves the original variable age, measured as a continu-
ous variable, intact so that you can use it for other analyses.
Procedure for collapsing a continuous variable into groups
1. From the menu at the top of the screen, click on Transform and choose
Visual Binning.
2. Select the continuous variable that you want to use (e.g. age). Transfer it
into the Variables to Bin box. Click on the Continue button.
3. In the Visual Binning screen, a histogram showing the distribution of age
scores should appear.
4. In the section at the top labelled Binned Variable, type the name for the
new categorical variable that you will create (e.g. Agegp3). You can also
change the suggested label that is shown (e.g. age in 3 groups).
5. Click on the button labelled Make Cutpoints. In the dialogue box that
appears, click on the option Equal Percentiles Based on Scanned Cases.
In the box Number of Cutpoints, specify a number one less than the
number of groups that you want (e.g. if you want three groups, type in
2 for cutpoints). In the Width (%) section below, you will then see 33.33
appear. This means that SPSS will try to put 33.3 per cent of the sample in
each group. Click on the Apply button.
6. Click on the Make Labels button back in the main dialogue box. This will
automatically generate value labels for each of the new groups created.
7. Click on OK (or on Paste if you wish to paste this command to the Syntax
Editor window). To execute it after pasting to the Syntax Editor, highlight
the command and select Run from the menu.
The syntax generated by this command is:
RECODE age
( MISSING = COPY )
( LO THRU 29 =1 )
( LO THRU 44 =2 )
( LO THRU HI = 3 )
( ELSE = SYSMIS ) INTO agegp3.
90 Preliminary analyses
VARIABLE LABELS agegp3 ‘age in 3 groups’.
FORMAT agegp3 (F5.0).
VALUE LABELS agegp3
1 ‘<= 29’
2 ‘30—44’
3 ‘45+’.
MISSING VALUES agegp3 ( ).
VARIABLE LEVEL agegp3 ( ORDINAL ).
EXECUTE.
A new variable (Agegp3) should appear at the end of your data fi le. Go back
to your Data Editor window, choose the Variable View tab, and it should be
at the bottom. To check the number of cases in each of the categories of your
newly created variable (Agegp3), go to Analyze and select Descriptives, then
Frequencies.
COLLAPSING THE NUMBER OF CATEGORIES OF A
CATEGORICAL VARIABLE
There are some situations where you may want to reduce or collapse the number of
categories of a categorical variable. You may want to do this for research or theoretical
reasons (e.g. collapsing the marital status into just two categories representing people
‘in a relationship’/‘not in a relationship’), or you may make the decision after looking
at the nature of the data. For example, after running Descriptive Statistics you may
nd you have only a few people in your sample who fall into a particular category (e.g.
for our education variable, we only have two people in our fi rst category,primary
school’). As it stands, this variable could not appropriately be used in many of the
statistical analyses covered later in the book. We could decide just to remove these
people from the sample, or we could recode them to combine them with the next
category (some secondary school). We would have to relabel the variable so that it
represented people who did not complete secondary school.
The procedure for recoding a categorical variable is shown below. It is very impor-
tant to note that here we are creating a new additional variable (so that we keep our
original data intact).
Procedure for recoding a categorical variable
1. From the menu at the top of the screen, click on Transform, then
on Recode into Different Variables. (Make sure you select ‘different
variables’, as this retains the original variable for other analyses.)
2. Select the variable you wish to recode (e.g. educ). In the Name box, type
Manipulating the data 91
a name for the new variable that will be created (e.g. educrec). Type in
an extended label if you wish in the Label section. Click on the button
labelled Change.
3. Click on the button labelled Old and New Values.
4. In the section Old Value, you will see a box labelled Value. Type in the fi rst
code or value of your current variable (e.g. 1). In the New Value section,
type in the new value that will be used (or, if the same one is to be used,
type that in). In this case I will recode to the same value, so I will type 1 in
both the Old Value and New Value sections. Click on the Add button.
5. For the second value, I would type 2 in the Old Value but in the New Value
I would type 1. This will recode all the values of both 1 and 2 from the
original coding into one group in the new variable to be created with a
value of 1.
6. For the third value of the original variable, I would type 3 in the Old
Value and 2 in the New Value. This is just to keep the values in the new
variable in sequence. Click on Add. Repeat for all the remaining values of
the original values. In the table Old > New, you should see the following
codes for this example: 11; 21; 32; 43; 54; 65.
7. Click on Continue and then on OK (or on Paste if you wish to paste this
command to the Syntax Editor window). To execute it after pasting to the
Syntax Editor, highlight the command and select Run from the menu.
8. Go to your Data Editor window and choose the Variable View tab. Type
in appropriate values labels to represent the new values (1=did not
complete high school, 2=completed high school, 3=some additional
training, 4=completed undergrad uni, 5=completed postgrad uni).
Remember, these will be different from the codes used for the original
variable, and it is important that you don’t mix them up.
The syntax generated by this command is:
RECODE
educ
(1=1) (2=1) (3=2) (4=3) (5=4) (6=5) INTO educrec .
EXECUTE .
When you recode a variable, make sure you run Frequencies on both the old
variable (educ) and the newly created variable (educrec:, which appears at the end of
your data fi le). Check that the frequencies reported for the new variable are correct.
For example, for the newly created educrec variable, we should now have 2+53=55 in
92 Preliminary analyses
the fi rst group. This represents the two people who ticked 1 on the original variable
(primary school) and the 53 people who ticked 2 (some secondary school).
The Recode procedure demonstrated here could be used for a variety of purposes.
You may fi nd later, when you come to do your statistical analyses, that you will need
to recode the values used for a variable. For example, in Chapter 14 (Logistic regres-
sion) you may need to recode variables originally coded 1=yes, 2=no to a new coding
system 1=yes, 0=no. This can be achieved in the same way as described in the previous
procedures section. Just be very clear before you start on what your original values are,
and what you want the new values to be.
TRANSFORMING VARIABLES
Often when you check the distribution of scores on a scale or measure (e.g. self-
esteem, anxiety) you will fi nd (to your dismay!) that the scores do not fall in a nice,
normally distributed curve. Sometimes scores will be positively skewed, where most
of the respondents record low scores on the scale (e.g. depression). Sometimes you
will fi nd a negatively skewed distribution, where most scores are at the high end (e.g.
self-esteem). Given that many of the parametric statistical tests assume normally
distributed scores, what do you do about these skewed distributions?
One of the choices you have is to abandon the use of parametric statistics (e.g.
Pearson correlation, analysis of variance) and instead choose to use non-parametric
alternatives (e.g. Spearmans rho, Kruskal-Wallis). SPSS includes a number of useful
non-parametric techniques in its package. These are discussed in Chapter 16.
Another alternative, when you have a non-normal distribution, is to ‘transform your
variables. This involves mathematically modifying the scores using various formulas until
the distribution looks more normal. There are a number of different types of transfor-
mation, depending on the shape of your distribution. There is considerable controversy
concerning this approach in the literature, with some authors strongly supporting, and
others arguing against, transforming variables to better meet the assumptions of the
various parametric techniques. For a discussion of the issues and the approaches to
transformation, you should read Chapter 4 in Tabachnick and Fidell (2007).
In Figure 8.2 some of the more common problems are represented, along with
the type of transformation recommended by Tabachnick and Fidell (2007, p. 87). You
should compare your distribution with those shown, and decide which picture it most
closely resembles. I have also given a nasty-looking formula beside each of the suggested
transformations. Don’t let this throw you—these are just formulas that SPSS will use
on your data, giving you a new, hopefully normally distributed variable to use in your
analyses. In the procedures section to follow, you will be shown the SPSS procedure for
this. Before attempting any of these transformations, however, it is important that you
read Tabachnick and Fidell (2007, Chapter 4), or a similar text, thoroughly.
Manipulating the data 93
Figure 8.2
Distribution
of scores and
suggested
transformations
Square root
Formula: new variable = SQRT (old variable)
Logarithm
Formula: new variable = LG10 (old variable)
Inverse
Formula: new variable = 1 / (old variable)
Refl ect and square root
Formula: new variable = SQRT (K – old variable) where
K = largest possible value +1
Refl ect and logarithm
Formula: new variable = LG10 (K – old variable) where
K = largest possible value +1
Refl ect and inverse
Formula: new variable = 1 / (K – old variable) where
K = largest possible value +1
94 Preliminary analyses
Procedure for transforming variables
1. From the menu at the top of the screen, click on Transform, then click on
Compute Variable.
2. Target Variable. In this box, type in a new name for the variable. Try to
include an indication of the type of transformation and the original name
of the variable. For example, for a variable called tnegaff I would make this
new variable sqtnegaff, if I had performed a square root. Be consistent in
the abbreviations that you use for each of your transformations.
3. Functions. Listed are a wide range of possible actions you can use. You
need to choose the most appropriate transformation for your variable.
Look at the shape of your distribution; compare it with those in
Figure 8.2. Take note of the formula listed next to the picture that
matches your distribution. This is the one that you will use.
4. Transformations involving square root or logarithm. In the Function
group box, click on Arithmetic, and scan down the list that shows up in
the bottom box until you fi nd the formula you need (e.g. Sqrt or Lg10).
Highlight the one you want and click on the up arrow. This moves the
formula into the Numeric Expression box. You will need to tell it which
variable you want to recalculate. Find it in the list of variables and click
on the arrow to move it into the Numeric Expression box. If you prefer,
you can just type the formula in yourself without using the Functions or
Variables list. Just make sure you spell everything correctly.
5. Transformations involving Refl ect. You need to fi nd the value K for
your variable. This is the largest value that your variable can have (see
your codebook) + 1. Type this number in the Numeric Expression box.
Complete the remainder of the formula using the Functions box, or
alternatively type it in yourself.
6. Transformations involving Inverse. To calculate the inverse, you need to
divide your scores into 1. So, in the Numeric Expression box type in 1, then
type / and then your variable or the rest of your formula (e.g. 1/tslfest).
7. Check the fi nal formula in the Numeric Expression box. Write this down
in your codebook next to the name of the new variable you created.
8. Click on the button Type and Label. Under Label, type in a brief description
of the new variable (or you may choose to use the actual formula you used).
9. Check in the Target Variable box that you have given your new variable
a new name, not the original one. If you accidentally put the old variable
name, you will lose all your original scores. So, always double-check.
10. Click on OK (or on Paste if you wish to paste this command to the Syntax
Editor window). To execute it after pasting to the Syntax Editor, highlight
Manipulating the data 95
the command and select Run from the menu. A new variable will be
created and will appear at the end of your data fi le.
11. Run Analyze, Frequencies to check the skewness and kurtosis values for
your old and new variables. Have they improved?
12. Under Frequencies, click on the Charts button and select Histogram
to inspect the distribution of scores on your new variable. Has the
distribution improved? If not, you may need to consider a different type
of transformation.
If none of the transformations work, you may need to consider using non-para-
metric techniques to analyse your data (see Chapter 16). Alternatively, for very skewed
variables you may wish to divide your continuous variable into a number of discrete
groups. Instructions for doing this are presented earlier in this chapter.
ADDITIONAL EXERCISES
Business
Data fi le: staffsurvey4ED.sav. See Appendix for details of the data fi le.
1. Practise the procedures described in this chapter to add up the total scores for
a scale using the items that make up the Staff Satisfaction Survey. You will need
to add together the items that assess agreement with each item in the scale (i.e.
Q1a+Q2a+Q3a … to Q10a). Name your new variable staffsatis.
2. Check the descriptive statistics for your new total score (staffsatis) and compare
this with the descriptives for the variable totsatis, which is already in your data fi le.
This is the total score that I have already calculated for you.
3. What are the minimum possible and maximum possible scores for this new
variable? Tip: check the number of items in the scale and the number of response
points on each item (see Appendix).
4. Check the distribution of the variable service by generating a histogram. You will
see that it is very skewed, with most people clustered down the low end (with less
than 2 years service) and a few people stretched up at the very high end (with
more than 30 years’ service). Check the shape of the distribution against those
displayed in Figure 8.2 and try a few different transformations. Remember to
check the distribution of the new transformed variables you create. Are any of the
new variables more normally’ distributed?
5. Collapse the years of service variable (service) into three groups using the Visual
Binning procedure from the Transform menu. Use the Make Cutpoints button
and ask for Equal Percentiles. In the section labelled Number of Cutpoints,
specify 2. Call your new variable gp3service to distinguish it from the variable
96 Preliminary analyses
I have already created in the data fi le using this procedure (service3gp). Run
Frequencies on your newly created variable to check how many cases are in each
group.
Health
Data fi le: sleep4ED.sav. See Appendix for details of the data fi le.
1. Practise the procedures described in this chapter to add up the total scores for a
scale using the items that make up the Sleepiness and Associated Sensations Scale.
You will need to add together the items fatigue, lethargy, tired, sleepy, energy. Call
your new variable sleeptot. Please note: none of these items needs to be reversed
before being added.
2. Check the descriptive statistics for your new total score (sleeptot) and compare
them with the descriptives for the variable totSAS, which is already in your data
le. This is the total score that I have already calculated for you.
3. What are the minimum possible and maximum possible scores for this new
variable? Tip: check the number of items in the scale and the number of response
points on each item (see Appendix).
4. Check the distribution (using a histogram) of the variable that measures the
number of cigarettes smoked per day by the smokers in the sample (smokenum).
You will see that it is very skewed, with most people clustered down the low end
(with less than 10 per day) and a few people stretched up at the very high end
(with more than 70 per day). Check the shape of the distribution against those
displayed in Figure 8.2 and try a few different transformations. Remember to
check the distribution of the new transformed variables you create. Are any of the
new transformed variables more normally’ distributed?
5. Collapse the age variable (age) into three groups using the Visual Binning pro-
cedure from the Transform menu. Use the Make Cutpoints button and ask for
Equal Percentiles. In the section labelled Number of Cutpoints, specify 2. Call
your new variable gp3age to distinguish it from the variable I have already created
in the data fi le using this procedure (age3gp). Run Frequencies on your newly
created variable to check how many cases are in each group.
97
9
Checking the reliability
of a scale
When you are selecting scales to include in your study, it is important to fi nd scales
that are reliable. There are a number of different aspects to reliability (see discussion
of this in Chapter 1). One of the main issues concerns the scale’s internal consistency.
This refers to the degree to which the items that make up the scale hang together’.
Are they all measuring the same underlying construct? One of the most commonly
used indicators of internal consistency is Cronbachs alpha coeffi cient. Ideally, the
Cronbach alpha coeffi cient of a scale should be above .7 (DeVellis 2003). Cronbach
alpha values are, however, quite sensitive to the number of items in the scale. With
short scales (e.g. scales with fewer than ten items) it is common to fi nd quite low
Cronbach values (e.g. .5). In this case, it may be more appropriate to report the mean
inter-item correlation for the items. Briggs and Cheek (1986) recommend an optimal
range for the inter-item correlation of .2 to .4.
The reliability of a scale can vary depending on the sample. It is therefore necessary
to check that each of your scales is reliable with your particular sample. This infor-
mation is usually reported in the Method section of your research paper or thesis. If
your scale contains some items that are negatively worded (common in psychological
measures), these need to be ‘reversed’ before checking reliability. Instructions on how
to do this are provided in Chapter 8.
Make sure that you check with the scale’s manual (or the journal article in which it
is reported) for instructions concerning the need to reverse items and for information
on any subscales. Sometimes scales contain a number of subscales that may or may
not be combined to form a total scale score. If necessary, the reliability of each of the
subscales and the total scale will need to be calculated.
If you are developing your own scale for use in your study, make sure you read
widely on the principles and procedures of scale development. There are some good
easy-to-read books on the topic, including Streiner & Norman (2008), DeVellis (2003)
and Kline (2005).
98 Preliminary analyses
DETAILS OF EXAMPLE
To demonstrate this technique, I will be using the survey4ED.sav data fi le included on
the website accompanying this book. Full details of the study, the questionnaire and
scales used are provided in the Appendix. If you wish to follow along with the steps
described in this chapter, you should start SPSS and open the fi le survey4ED.sav. In
the procedure described below, I will explore the internal consistency of one of the
scales from the questionnaire. This is the Satisfaction with Life Scale (Pavot, Diener,
Colvin & Sandvik 1991), which is made up of ve items. In the data fi le these items are
labelled as lifsat1, lifsat2, lifsat3, lifsat4, lifsat5.
Procedure for checking the reliability of a scale
Important: before starting, you should check that all negatively worded items
in your scale have been reversed (see Chapter 8). If you don’t do this, you will
nd that you have very low (and incorrect) Cronbach alpha values. In this
case, none of the items needs to be rescored.
1. From the menu at the top of the screen, click on Analyze, select Scale,
then Reliability Analysis.
2. Click on all of the individual items that make up the scale (e.g. lifsat1,
lifsat2, lifsat3, lifsat4, lifsat5). Move these into the box marked Items.
3. In the Model section, make sure Alpha is selected.
4. In the Scale label box, type in the name of the scale or subscale (Life
Satisfaction).
5. Click on the Statistics button. In the Descriptives for section, select
Item, Scale, and Scale if item deleted. In the Inter-Item section, click on
Correlations. In the Summaries section, click on Correlations.
6. Click on Continue and then OK (or on Paste to save to Syntax Editor).
The syntax from this procedure is:
RELIABILITY
/VARIABLES=lifsat1 lifsat2 lifsat3 lifsat4 lifsat5
/SCALE(‘Life Satisfaction’) ALL/MODEL=ALPHA
/STATISTICS=DESCRIPTIVE SCALE CORR
/SUMMARY=TOTAL CORR .
The output generated from this procedure is shown below.
Checking the reliability of a scale 99
100 Preliminary analyses
INTERPRETING THE OUTPUT FROM RELIABILITY
Check that the number of cases is correct (in the Case Processing Summary table)
and that the number of items is correct (in the Reliability Statistics table).
• Check the Inter-Item Correlation Matrix for negative values. All values should
be positive, indicating that the items are measuring the same underlying charac-
teristic. The presence of negative values could indicate that some of the items
have not been correctly reverse scored. Incorrect scoring would also show up in
the Item-Total Statistics table with negative values for the Corrected-Item Total
Correlation values. These should be checked carefully if you obtain a lower than
expected Cronbach alpha value. (Check what other researchers report for the
scale.)
• Check the Cronbachs Alpha value shown in the Reliability Statistics table. In
this example the value is .89, suggesting very good internal consistency reliability
for the scale with this sample. Values above .7 are considered acceptable; however,
values above .8 are preferable.
• The Corrected Item-Total Correlation values shown in the Item-Total Statistics
table give you an indication of the degree to which each item correlates with the
total score. Low values (less than .3) here indicate that the item is measuring some-
thing different from the scale as a whole. If your scales overall Cronbach alpha is
too low (e.g. less than .7) and you have checked for incorrectly scored items, you
may need to consider removing items with low item-total correlations.
In the column headed Alpha if Item Deleted, the impact of removing each item
from the scale is given. Compare these values with the fi nal alpha value obtained. If
any of the values in this column are higher than the fi nal alpha value, you may want
to consider removing this item from the scale. This is useful if you are developing a
scale, but if you are using established, validated scales, removal of items means that
you could not compare your results with other studies using the scale.
For scales with a small number of items (e.g. less than 10), it is sometimes diffi cult
to get a decent Cronbach alpha value and you may wish to consider reporting the
mean inter-item correlation value, which is shown in the Summary Item Statis-
tics table. In this case the mean inter-item correlation is .63, with values ranging
from .48 to .76. This suggests quite a strong relationship among the items. For
many scales, this is not the case.
PRESENTING THE RESULTS FROM RELIABILITY
You would normally report the internal consistency of the scales that you are using in
your research in the Method section of your report, under the heading Measures, or
Materials. After describing the scale (number of items, response scale used, history of
Checking the reliability of a scale 101
use), you should include a summary of reliability information reported by the scale
developer and other researchers, and then a sentence to indicate the results for your
sample. For example:
According to Pavot, Diener, Colvin and Sandvik (1991), the Satisfaction with Life
Scale has good internal consistency, with a Cronbach alpha coeffi cient reported
of .85. In the current study, the Cronbach alpha coeffi cient was .89.
ADDITIONAL EXERCISES
Business
Data fi le: staffsurvey4ED.sav. See Appendix for details of the data fi le.
1. Check the reliability of the Staff Satisfaction Survey, which is made up of the
agreement items in the data fi le: Q1a to Q10a. None of the items of this scale
needs to be reversed.
Health
Data fi le: sleep4ED.sav. See Appendix for details of the data fi le.
1. Check the reliability of the Sleepiness and Associated Sensations Scale, which is
made up of items fatigue, lethargy, tired, sleepy, energy. None of the items of this
scale needs to be reversed.
102
10
Choosing the
right statistic
One of the most diffi cult (and potentially fear-inducing) parts of the research process
for most research students is choosing the correct statistical technique to analyse their
data. Although most statistics courses teach you how to calculate a correlation coef-
cient or perform a t-test, they typically do not spend much time helping students learn
how to choose which approach is appropriate to address particular research questions.
In most research projects it is likely that you will use quite a variety of different types of
statistics, depending on the question you are addressing and the nature of the data that
you have. It is therefore important that you have at least a basic understanding of the
different statistics, the type of questions they address and their underlying assumptions
and requirements.
So, dig out your statistics texts and review the basic techniques and the principles
underlying them. You should also look through journal articles on your topic and
identify the statistical techniques used in these studies. Different topic areas may make
use of different statistical approaches, so it is important that you fi nd out what other
researchers have done in terms of data analysis. Look for long, detailed journal articles
that clearly and simply spell out the statistics that were used. Collect these together in
a folder for handy reference. You might also fi nd them useful later when considering
how to present the results of your analyses.
In this chapter we will look at the various statistical techniques that are available,
and I will then take you step by step through the decision-making process. If the whole
statistical process sends you into a panic, just think of it as choosing which recipe you
will use to cook dinner tonight. What ingredients do you have in the refrigerator,
what type of meal do you feel like (soup, roast, stir-fry, stew), and what steps do you
have to follow? In statistical terms, we will look at the type of research questions you
have, which variables you want to analyse, and the nature of the data itself. If you take
this process step by step, you will fi nd the fi nal decision is often surprisingly simple.
Once you have determined what you have and what you want to do, there often is only
Choosing the right statistic 103
one choice. The most important part of this whole process is clearly spelling out what
you have and what you want to do with it.
OVERVIEW OF THE DIFFERENT STATISTICAL
TECHNIQUES
This section is broken into two main parts. First, we will look at the techniques used
to explore the relationship among variables (e.g. between age and optimism), followed
by techniques you can use when you want to explore the differences between groups
(e.g. sex differences in optimism scores). I have separated the techniques into these
two sections, as this is consistent with the way in which most basic statistics texts are
structured and how the majority of students will have been taught basic statistics. This
tends to somewhat artifi cially emphasise the difference between these two groups of
techniques. There are, in fact, many underlying similarities between the various statis-
tical techniques, which is perhaps not evident on initial inspection. A full discussion
of this point is beyond the scope of this book. If you would like to know more, I
would suggest you start by reading Chapter 17 of Tabachnick and Fidell (2007). That
chapter provides an overview of the General Linear Model, under which many of the
statistical techniques can be considered.
I have deliberately kept the summaries of the different techniques brief and simple,
to aid initial understanding. This chapter certainly does not cover all the different
techniques available, but it does give you the basics to get you started and to build
your confi dence.
Exploring relationships
Often in survey research you will not be interested in differences between groups, but
instead in the strength of the relationship between variables. There are a number of
different techniques that you can use.
Correlation
Pearson correlation or Spearman correlation is used when you want to explore the
strength of the relationship between two continuous variables. This gives you an indi-
cation of both the direction (positive or negative) and the strength of the relationship.
A positive correlation indicates that as one variable increases, so does the other. A
negative correlation indicates that as one variable increases, the other decreases. This
topic is covered in Chapter 11.
Partial correlation
Partial correlation is an extension of Pearson correlation—it allows you to control for
the possible effects of another confounding variable. Partial correlation removes’ the
104 Preliminary analyses
effect of the confounding variable (e.g. socially desirable responding), allowing you to
get a more accurate picture of the relationship between your two variables of interest.
Partial correlation is covered in Chapter 12.
Multiple regression
Multiple regression is a more sophisticated extension of correlation and is used when
you want to explore the predictive ability of a set of independent variables on one
continuous dependent measure. Different types of multiple regression allow you to
compare the predictive ability of particular independent variables and to fi nd the best
set of variables to predict a dependent variable. See Chapter 13.
Factor analysis
Factor analysis allows you to condense a large set of variables or scale items down to a
smaller, more manageable number of dimensions or factors. It does this by summaris-
ing the underlying patterns of correlation and looking for clumps’ or groups of closely
related items. This technique is often used when developing scales and measures, to
identify the underlying structure. See Chapter 15.
Summary
All of the analyses described above involve exploration of the relationship between
continuous variables. If you have only categorical variables, you can use the Chi Square
Test for Relatedness or Independence to explore their relationship (e.g. if you wanted
to see whether gender infl uenced clients’ dropout rates from a treatment program). In
this situation, you are interested in the number of people in each category (males and
females who drop out of/complete the program) rather than their score on a scale.
Some additional techniques you should know about, but which are not covered in this
text, are described below. For more information on these, see Tabachnick and Fidell
(2007). These techniques are as follows:
Discriminant function analysis is used when you want to explore the predictive ability
of a set of independent variables, on one categorical dependent measure. That is,
you want to know which variables best predict group membership. The dependent
variable in this case is usually some clear criterion (passed/failed, dropped out of/
continued with treatment). See Chapter 9 in Tabachnick and Fidell (2007).
• Canonical correlation is used when you wish to analyse the relationship between
two sets of variables. For example, a researcher might be interested in how a
variety of demographic variables relate to measures of wellbeing and adjustment.
See Chapter 12 in Tabachnick and Fidell (2007).
Structural equation modelling is a relatively new, and quite sophisticated, technique
that allows you to test various models concerning the interrelationships among a set
Choosing the right statistic 105
of variables. Based on multiple regression and factor analytic techniques, it allows you
to evaluate the importance of each of the independent variables in the model and to
test the overall fi t of the model to your data. It also allows you to compare alterna-
tive models. SPSS does not have a structural equation modelling module, but it does
support an ‘add on called AMOS. See Chapter 14 in Tabachnick and Fidell (2007).
Exploring differences between groups
There is another family of statistics that can be used when you want to fi nd out
whether there is a statistically signifi cant difference among a number of groups. The
parametric versions of these tests, which are suitable when you have interval-scaled
data with normal distribution of scores, are presented below, along with the non-
parametric alternative.
T-tests
T-tests are used when you have two groups (e.g. males and females) or two sets of
data (before and after), and you wish to compare the mean score on some continuous
variable. There are two main types of t-tests. Paired sample t-tests (also called repeated
measures) are used when you are interested in changes in scores for participants tested
at Time 1, and then again at Time 2 (often after some intervention or event). The
samples are related’ because they are the same people tested each time. Independent
sample t-tests are used when you have two different (independent) groups of people
(males and females), and you are interested in comparing their scores. In this case,
you collect information on only one occasion but from two different sets of people.
T-tests are covered in Chapter 17. The non-parametric alternatives, Mann-Whitney
U Test and Wilcoxon Signed Rank Test, are presented in Chapter 16.
One-way analysis of variance
One-way analysis of variance is similar to a t-test, but is used when you have two or more
groups and you wish to compare their mean scores on a continuous variable. It is called
one-way because you are looking at the impact of only one independent variable on your
dependent variable. A one-way analysis of variance (ANOVA) will let you know whether
your groups differ, but it wont tell you where the signifi cant difference is (gp1/gp3,
gp2/gp3 etc.). You can conduct post-hoc comparisons to fi nd out which groups are
signifi cantly different from one another. You could also choose to test differences between
specifi c groups, rather than comparing all the groups, by using planned comparisons.
Similar to t-tests, there are two types of one-way ANOVAs: repeated measures ANOVA
(same people on more than two occasions), and between-groups (or independent
samples) ANOVA, where you are comparing the mean scores of two or more different
groups of people. One-way ANOVA is covered in Chapter 18, while the non-parametric
alternatives (Kruskal-Wallis Test and Friedman Test) are presented in Chapter 16.
106 Preliminary analyses
Two-way analysis of variance
Two-way analysis of variance allows you to test the impact of two independent vari-
ables on one dependent variable. The advantage of using a two-way ANOVA is that it
allows you to test for an interaction effect—that is, when the effect of one indepen-
dent variable is infl uenced by another; for example, when you suspect that optimism
increases with age, but only for males.
It also tests for ‘main effects’—that is, the overall effect of each independent
variable (e.g. sex, age). There are two different two-way ANOVAs: between-groups
ANOVA (when the groups are different) and repeated measures ANOVA (when the
same people are tested on more than one occasion). Some research designs combine
both between-groups and repeated measures in the one study. These are referred to
as ‘Mixed Between-Within Designs, or Split Plot’. Two-way ANOVA is covered in
Chapter 19. Mixed designs are covered in Chapter 20.
Multivariate analysis of variance
Multivariate analysis of variance (MANOVA) is used when you want to compare
your groups on a number of different, but related, dependent variables; for example,
comparing the effects of different treatments on a variety of outcome measures (e.g.
anxiety, depression). Multivariate ANOVA can be used with one-way, two-way and
higher factorial designs involving one, two or more independent variables. MANOVA
is covered in Chapter 21.
Analysis of covariance
Analysis of covariance (ANCOVA) is used when you want to statistically control for
the possible effects of an additional confounding variable (covariate). This is useful
when you suspect that your groups differ on some variable that may infl uence the
effect that your independent variables have on your dependent variable. To be sure
that it is the independent variable that is doing the infl uencing, ANCOVA statistically
removes the effect of the covariate. Analysis of covariance can be used as part of a
one-way, two-way or multivariate design. ANCOVA is covered in Chapter 22.
THE DECISION-MAKING PROCESS
Having had a look at the variety of choices available, it is time to choose which tech-
niques are suitable for your needs. In choosing the right statistic, you will need to
consider a number of different factors. These include consideration of the type of
question you wish to address, the type of items and scales that were included in your
questionnaire, the nature of the data you have available for each of your variables and
the assumptions that must be met for each of the different statistical techniques. I
have set out below a number of steps that you can use to navigate your way through
the decision-making process.
Choosing the right statistic 107
Step 1: What questions do you want to address?
Write yourself a full list of all the questions you would like to answer from your
research. You might fi nd that some questions could be asked a number of different
ways. For each of your areas of interest, see if you can present your question in a
number of different ways. You will use these alternatives when considering the differ-
ent statistical approaches you might use. For example, you might be interested in the
effect of age on optimism. There are a number of ways you could ask the question:
Is there a relationship between age and level of optimism?
Are older people more optimistic than younger people?
These two different questions require different statistical techniques. The question of
which is more suitable may depend on the nature of the data you have collected. So,
for each area of interest, detail a number of different questions.
Step 2: Find the questionnaire items and scales that you will
use to address these questions
The type of items and scales that were included in your study will play a large part
in determining which statistical techniques are suitable to address your research
questions. That is why it is so important to consider the analyses that you intend to
use when fi rst designing your study. For example, the way in which you collected
information about respondents age (see example in Step 1) will determine which
statistics are available for you to use. If you asked people to tick one of two options
(under 35/over 35), your choice of statistics would be very limited because there
are only two possible values for your variable age. If, on the other hand, you asked
people to give their age in years, your choices are broadened because you can have
scores varying across a wide range of values, from 18 to 80+. In this situation, you
may choose to collapse the range of ages down into a smaller number of categories
for some analyses (ANOVA), but the full range of scores is also available for other
analyses (e.g. correlation).
If you administered a questionnaire or survey for your study, go back to the
specifi c questionnaire items and your codebook and fi nd each of the individual ques-
tions (e.g. age) and total scale scores (e.g. optimism) that you will use in your analyses.
Identify each variable, how it was measured, how many response options there were
and the possible range of scores.
If your study involved an experiment, check how each of your dependent and
independent variables was measured. Did the scores on the variable consist of the
number of correct responses, an observer’s rating of a specifi c behaviour, or the length
of time a subject spent on a specifi c activity? Whatever the nature of the study, just be
clear that you know how each of your variables was measured.
108 Preliminary analyses
Step 3: Identify the nature of each of your variables
The next step is to identify the nature of each of your variables. In particular, you need
to determine whether each of your variables is an independent variable or a dependent
variable. This information comes not from your data but from your understanding
of the topic area, relevant theories and previous research. It is essential that you are
clear in your own mind (and in your research questions) concerning the relationship
between your variables—which ones are doing the infl uencing (independent) and
which ones are being affected (dependent). There are some analyses (e.g. correlation)
where it is not necessary to specify which variables are independent and dependent.
For other analyses, such as ANOVA, it is important that you have this clear. Drawing
a model of how you see your variables relating is often useful here (see Step 4,
discussed next).
It is also important that you know the level of measurement for each of your vari-
ables. Different statistics are required for variables that are categorical and continuous,
so it is important to know what you are working with. Are your variables:
categorical (also referred to as nominal level data, e.g. sex: male/females)?
ordinal (rankings: 1st, 2nd, 3rd)?
continuous (also referred to as interval level data, e.g. age in years, or scores on the
Optimism Scale)?
There are some occasions when you might want to change the level of measurement
for particular variables. You can collapse continuous variable responses down into a
smaller number of categories (see Chapter 8). For example, age can be broken down
into different categories (e.g. under 35/over 35). This can be useful if you want to
conduct an ANOVA. It can also be used if your continuous variables do not meet
some of the assumptions for particular analyses (e.g. very skewed distributions).
Summarising the data does have some disadvantages, however, as you lose infor-
mation. By ‘lumping people together, you can sometimes miss important differences.
So you need to weigh up the benefi ts and disadvantages carefully.
Additional information required for continuous and categorical
variables
For continuous variables, you should collect information on the distribution of scores
(e.g. are they normally distributed or are they badly skewed?). What is the range of
scores? (See Chapter 6 for the procedures to do this.) If your variable involves cat-
egories (e.g. group 1/group 2, males/females), fi nd out how many people fall into each
category (are the groups equal or very unbalanced?). Are some of the possible cat-
egories empty? (See Chapter 6.) All of this information that you gather about your
variables here will be used later to narrow down the choice of statistics to use.
Choosing the right statistic 109
Step 4: Draw a diagram for each of your research questions
I often fi nd that students are at a loss for words when trying to explain what they
are researching. Sometimes it is easier, and clearer, to summarise the key points in a
diagram. The idea is to pull together some of the information you have collected in
Steps 1 and 2 above in a simple format that will help you choose the correct statistical
technique to use, or to choose among a number of different options.
One of the key issues you should be considering is: am I interested in the relation-
ship between two variables, or am I interested in comparing two groups of participants?
Summarising the information that you have, and drawing a diagram for each question,
may help clarify this for you. I will demonstrate by setting out the information and
drawing diagrams for a number of different research questions.
Question 1: Is there a relationship between age and level of optimism?
Variables:
Age—continuous: age in years from 18 to 80.
Optimism—continuous: scores on the Optimism Scale, ranging from 6 to 30.
From your literature review you hypothesise that older people are more optimistic
than younger people. This relationship between two continuous variables could be
illustrated as follows:
**
*
****
*
*****
*
***
*
Optimism
Age
If you expected optimism scores to increase with age, you would place the points
starting low on the left and moving up towards the right. If you predicted that
optimism would decrease with age, then your points would start high on the left-hand
side and would fall as you moved towards the right.
Question 2: Are males more optimistic than females?
Variables:
Sex—independent, categorical (two groups): males/females.
Optimism—dependent, continuous: scores on the Optimism Scale, ranging from
6 to 30.
110 Preliminary analyses
The results from this question, with one categorical variable (with only two groups)
and one continuous variable, could be summarised as follows:
Males Females
Mean optimism score
Question 3: Is the effect of age on optimism different for males and
females?
If you wished to investigate the joint effects of age and gender on optimism scores,
you might decide to break your sample into three age groups (under 30, 31–49 years
and 50+).
Variables:
Sex—independent, categorical: males/females.
Age—independent, categorical: participants divided into three equal groups.
Optimism—dependent, continuous: scores on the Optimism Scale, ranging from
6 to 30.
The diagram might look like this:
Age
Under 30 31–49 50 years and over
Mean optimism
score
Males
Females
Question 4: How much of the variance in life satisfaction can be
explained by a set of personality factors (self-esteem, optimism,
perceived control)?
Perhaps you are interested in comparing the predictive ability of a number of differ-
ent independent variables on a dependent measure. You are also interested in how
much variance in your dependent variable is explained by the set of independent
variables.
Variables:
• Self-esteem—independent, continuous.
• Optimism—independent, continuous.
Perceived control—independent, continuous.
Life satisfaction—dependent, continuous.
Choosing the right statistic 111
Your diagram might look like this:
Self-esteem
Optimism
Perceived control
Life satisfaction
Step 5: Decide whether a parametric or a non-parametric
statistical technique is appropriate
Just to confuse research students even further, the wide variety of statistical techniques
that are available are classifi ed into two main groups: parametric and non-parametric.
Parametric statistics are more powerful, but they do have more strings attached’; that
is, they make assumptions about the data that are more stringent. For example, they
assume that the underlying distribution of scores in the population from which you
have drawn your sample is normal.
Each of the different parametric techniques (such as t-tests, ANOVA, Pearson
correlation) has other additional assumptions. It is important that you check these
before you conduct your analyses. The specifi c assumptions are listed for each of the
techniques covered in the remaining chapters of this book.
What if you don’t meet the assumptions for the statistical technique that you want to
use? Unfortunately, in social science research this is a common situation. Many of the
attributes we want to measure are in fact not normally distributed. Some are strongly
skewed, with most scores falling at the low end (e.g. depression); others are skewed so
that most of the scores fall at the high end of the scale (e.g. self-esteem).
If you don’t meet the assumptions of the statistic you wish to use you have a
number of choices, and these are detailed below.
Option 1
You can use the parametric technique anyway and hope that it does not seriously
invalidate your fi ndings. Some statistics writers argue that most of the approaches are
fairly robust’; that is, they will tolerate minor violations of assumptions, particularly
if you have a good size sample. If you decide to go ahead with the analysis anyway you
will need to justify this in your write-up, so collect together useful quotes from statis-
tics writers, previous researchers etc. to support your decision. Check journal articles
on your topic area, particularly those that have used the same scales. Do they mention
similar problems? If so, what have these other authors done? For a simple, easy-
to-follow review of the robustness of different tests, see Cone and Foster (2006).
Option 2
You may be able to manipulate your data so that the assumptions of the statistical
test (e.g. normal distribution) are met. Some authors suggest ‘transforming’ your
112 Preliminary analyses
variables if their distribution is not normal (see Chapter 8). There is some controversy
concerning this approach, so make sure you read up on this so that you can justify
what you have done (see Tabachnick & Fidell 2007).
Option 3
The other alternative when you dont meet parametric assumptions is to use a
non-parametric technique instead. For many of the commonly used parametric
techniques, there is a corresponding non-parametric alternative. These still come
with some assumptions, but less stringent ones. These non-parametric alternatives
(e.g. Kruskal-Wallis, Mann-Whitney U, Chi-square) tend to be not as powerful;
that is, they may be less sensitive in detecting a relationship or a difference among
groups. Some of the more commonly used non-parametric techniques are covered
in Chapter 16.
Step 6: Making the fi nal decision
Once you have collected the necessary information concerning your research ques-
tions, the level of measurement for each of your variables and the characteristics
of the data you have available, you are fi nally in a position to consider your options.
In the text below, I have summarised the key elements of some of the major statisti-
cal approaches you are likely to encounter. Scan down the list, fi nd an example of the
type of research question you want to address and check that you have all the neces-
sary ingredients. Also consider whether there might be other ways you could ask your
question and use a different statistical approach. I have included a summary table at
the end of this chapter to help with the decision-making process.
Seek out additional information on the techniques you choose to use to ensure
that you have a good understanding of their underlying principles and their assump-
tions. It is a good idea to use a number of different sources for this process: different
authors have different opinions. You should have an understanding of the contro-
versial issues—you may even need to justify the use of a particular statistic in your
situation—so make sure you have read widely.
KEY FEATURES OF THE MAJOR STATISTICAL
TECHNIQUES
This section is divided into two sections:
1. techniques used to explore relationships among variables (covered in Part Four of
this book)
2. techniques used to explore differences among groups (covered in Part Five of this
book).
Choosing the right statistic 113
Exploring relationships among variables
Chi-square for independence
Example of research question: What is the relationship between gender and dropout
rates from therapy?
What you need:
one categorical independent variable (e.g. sex: males/females)
one categorical dependent variable (e.g. dropout: Yes/No).
You are interested in the number of people in each category (not scores on a scale).
Diagram:
Males Females
Dropout Yes
No
Correlation
Example of research question: Is there a relationship between age and optimism
scores? Does optimism increase with age?
What you need: two continuous variables (e.g. age, optimism scores).
Diagram:
**
*
*****
****
****
*
Optimism
Age
Non-parametric alternative: Spearmans Rank Order Correlation.
Partial correlation
Example of research question: After controlling for the effects of socially desirable
responding, is there still a signifi cant relationship between optimism and life satisfac-
tion scores?
What you need: Three continuous variables (e.g. optimism, life satisfaction, socially
desirable responding).
Non-parametric alternative: None.
114 Preliminary analyses
Multiple regression
Example of research question: How much of the variance in life satisfaction scores
can be explained by the following set of variables: self-esteem, optimism and perceived
control? Which of these variables is a better predictor of life satisfaction?
What you need:
one continuous dependent variable (e.g. life satisfaction)
two or more continuous independent variables (e.g. self-esteem, optimism,
perceived control).
Diagram:
Self-esteem
Optimism
Perceived control
Life satisfaction
Non-parametric alternative: None.
Exploring differences between groups
Independent-samples t-test
Example of research question: Are males more optimistic than females?
What you need:
one categorical independent variable with only two groups (e.g. sex: males/
females)
one continuous dependent variable (e.g. optimism score).
Participants can belong to only one group.
Diagram:
Males Females
Mean optimism score
Non-parametric alternative: Mann-Whitney U Test.
Paired-samples t-test (repeated measures)
Example of research question: Does ten weeks of meditation training result in a
decrease in participants’ level of anxiety? Is there a change in anxiety levels from Time 1
(pre-intervention) to Time 2 (post-intervention)?
What you need:
one categorical independent variable (e.g. Time 1/Time 2)
one continuous dependent variable (e.g. anxiety score).
Choosing the right statistic 115
Same participants tested on two separate occasions: Time 1 (before intervention) and
Time 2 (after intervention).
Diagram:
Time 1 Time 2
Mean anxiety score
Non-parametric alternative: Wilcoxon Signed Rank Test.
One-way between-groups analysis of variance
Example of research question: Is there a difference in optimism scores for people
who are under 30, between 31–49 and 50 years and over?
What you need:
one categorical independent variable with two or more groups (e.g. age: under
30/31–49/50+)
one continuous dependent variable (e.g. optimism score).
Diagram:
Age
Under 30 31–49 50 years and over
Mean optimism score
Non-parametric alternative: Kruskal-Wallis Test.
Two-way between-groups analysis of variance
Example of research question: What is the effect of age on optimism scores for males
and females?
What do you need:
two categorical independent variables (e.g. sex: males/females; age group: under
30/31–49/50+)
one continuous dependent variable (e.g. optimism score).
Diagram:
Age
Under 30 31–49 50 years and over
Mean optimism
score
Males
Females
Non-parametric alternative: None.
116 Preliminary analyses
Note: analysis of variance can also be extended to include three or more independent
variables (usually referred to as Factorial Analysis of Variance).
Mixed between-within analysis of variance
Example of research question: Which intervention (maths skills/confi dence building)
is more effective in reducing participants’ fear of statistics, measured across three
periods (pre-intervention, post-intervention, three-month follow-up)?
What you need:
one between-groups independent variable (e.g. type of intervention)
one within-groups independent variable (e.g. time 1, time 2, time 3)
one continuous dependent variable (e.g. scores on Fear of Statistics Test).
Diagram:
Time
Time 1 Time 2 Time 3
Mean score on Fear of
Statistics Test
Maths skills intervention
Confi dence-building intervention
Non-parametric alternative: None.
Multivariate analysis of variance
Example of research question: Are males better adjusted than females in terms of
their general physical and psychological health (in terms of anxiety and depression
levels and perceived stress)?
What you need:
one categorical independent variable (e.g. sex: males/females)
two or more continuous dependent variables (e.g. anxiety, depression, perceived
stress).
Diagram:
Males Females
Anxiety
Depression
Perceived stress
Non-parametric alternative: None.
Note: multivariate analysis of variance can be used with one-way (one independent
variable), two-way (two independent variables) and higher-order factorial designs.
Covariates can also be included.
Choosing the right statistic 117
Analysis of covariance
Example of research question: Is there a signifi cant difference in the Fear of Statis-
tics Test scores for participants in the maths skills group and the confi dence-building
group, while controlling for their pre-test scores on this test?
What you need:
one categorical independent variable (e.g. type of intervention)
one continuous dependent variable (e.g. Fear of Statistics Test scores at Time 2)
one or more continuous covariates (e.g. Fear of Statistics Test scores at Time 1).
Non-parametric alternative: None.
Note: analysis of covariance can be used with one-way (one independent variable),
two-way (two independent variables) and higher-order factorial designs, and with
multivariate designs (two or more dependent variables).
FURTHER READINGS
The statistical techniques discussed in this chapter are only a small sample of all the
different approaches that you can take to data analysis. It is important that you are
aware of the existence, and potential uses, of a wide variety of techniques in order to
choose the most suitable one for your situation. Read as widely as you can.
For a coverage of the basic techniques (t-test, analysis of variance, correlation) go
back to your basic statistics texts, for example Cooper and Schindler (2003); Gravetter
and Wallnau (2004); Peat (2001); Runyon, Coleman and Pittenger (2000); Norman
and Streiner (2000). If you would like more detailed information, particularly on
multivariate statistics, see Hair, Black, Babin, Anderson and Tatham (2006) or Tabach-
nick and Fidell (2007).
118 Preliminary analyses
Summary table of the characteristics of the main statistical techniques
Purpose
Example of
question
Parametric
statistic
Non-parametric
alternative
Independent
variable
Dependent
variable
Essential
features
Exploring
relationships
What is the relationship
between gender and
dropout rates from
therapy
None Chi-square
Chapter 16
One categorical
variable
Sex: M/F
One categorical
variable
Dropout/complete
therapy: Yes/No
The number of cases
in each category is
considered, not scores
Is there a relationship
between age and
optimism scores?
Pearson product-
moment correlation
coeffi cient (r)
Chapter 11
Spearman’s
Rank Order
Correlation (rho)
Chapter 11
Two continuous
variables
Age, Optimism scores
One sample with
scores on two
different measures,
or same measure at
Time 1 and Time 2
After controlling for
the effects of socially
desirable responding
bias, is there still a
relationship between
optimism and life
satisfaction?
Partial correlation
Chapter 12
None Two continuous
variables and one
continuous variable
for which you wish to
control Optimism, life
satisfaction, scores on
a social desirability
scale
One sample with
scores on two
different measures,
or same measure at
Time 1 and Time 2
How much of the
variance in life
satisfaction scores can be
explained by self-esteem,
perceived control and
optimism?
Which of these variables
is the best predictor?
Multiple regression
Chapter 13
None Set of two or
more continuous
independent
variables
Self-esteem,
perceived control,
optimism
One continuous
dependent variable
Life satisfaction
One sample with
scores on all measures
What is the underlying
structure of the items
that make up the
Positive and Negative
Affect Scale? How many
factors are involved?
Factor analysis
Chapter 15
None Set of related
continuous variables
Items of the Positive
and Negative Affect
Scale
One sample, multiple
measures
Comparing
groups
Are males more likely to
drop out of therapy than
females?
None Chi-square
Chapter 16
One categorical
independent variable
Sex
One categorical
dependent variable
Dropout/complete
therapy
You are interested in
the number of people
in each catgegory,
not scores on a scale
Is there a change in
participants’ anxiety
scores from Time 1 to
Time 2?
Paired samples t-test
Chapter 17
Wilcoxon Signed
Rank Test
Chapter 16
One categorical
independent variable
(two levels)
Time 1/Time 2
One continuous
dependent variable
Anxiety score
Same people on two
different occasions
Choosing the right statistic 119
Purpose
Example of
question
Parametric
statistic
Non-parametric
alternative
Independent
variable
Dependent
variable
Essential
features
Is there a difference
in optimism scores for
people who are under 35
yrs, 36–49 yrs and
50+ yrs?
One-way between
groups ANOVA
Chapter 18
Kruskal-Wallis
Test
Chapter 16
One categorical
independent variable
(three or more levels)
Age group
One continuous
dependent variable
Anxiety score
Three or more
groups: different
people in each group
Is there a change in
participants’ anxiety
scores from Time 1,
Time 2 and Time 3?
Two-way repeated
measures ANOVA
Chapter 18
Friedman Test
Chapter 16
One categorical
independent variable
(three or more levels)
Time 1/ Time 2/Time 3
One continuous
dependent variable
Anxiety score
Three or more
groups: same people
on two different
occasions
Is there a difference in
the optimism scores for
males and females, who
are under 35 yrs,
36–49 yrs and 50+ yrs?
Two-way between
groups ANOVA
Chapter 19
None Two categorical
independent
variables (two or
more levels)
Age group, Sex
One continuous
dependent variable
Optimism score
Two or more groups
for each independent
variable: different
people in each group
Which intervention
(maths skills/confi dence
building) is more
effective in reducing
participants’ fear of
statistics, measured
across three time
periods?
Mixed between-
within ANOVA
Chapter 20
None One between-groups
independent variable
(two or more levels),
one within-groups
independent variable
(two or more levels)
Type of intervention,
Time
One continuous
dependent variable
Fear of Statistics Test
scores
Two or more groups
with different people
in each group, each
measured on two or
more occasions
Is there a difference
between males and
females, across three
different age groups, in
terms of their scores on
a variety of adjustment
measures (anxiety,
depression and perceived
stress)?
Multivariate
ANOVA (MANOVA)
Chapter 21
None One or more
categorical
independent
variables (two or
more levels)
Age group, Sex
Two or more
related continuous
dependent variables
Anxiety, depression
and perceived stress
scores
Is there a signifi cant
difference in the Fear
of Stats Test scores
for participants in the
maths skills group and
the confi dence building
group, while controlling
for their scores on this
test at Time 1?
Analysis of
covariance
(ANCOVA)
Chapter 22
None One or more
categorical
independent
variables (two or
more levels), one
continuous covariate
variable Type of
intervention, Fear of
Stats Test scores at
Time 1
One continuous
dependent variable
Fear of Stats Test
scores at Time 2
This page intentionally left blank
PART FOUR
Statistical
techniques
to explore
relationships
among variables
In the chapters included in this section, we will be looking at some of the techniques
available in SPSS for exploring relationships among variables. In this section, our
focus is on detecting and describing relationships among variables. All of the tech-
niques covered here are based on correlation. Correlational techniques are often used
by researchers engaged in non-experimental research designs. Unlike experimen-
tal designs, variables are not deliberately manipulated or controlled—variables are
described as they exist naturally. These techniques can be used to:
explore the association between pairs of variables (correlation)
predict scores on one variable from scores on another variable (bivariate
regression)
121
122 Statistical techniques to explore relationships among variables
predict scores on a dependent variable from scores of a number of independent
variables (multiple regression)
identify the structure underlying a group of related variables (factor analysis).
This family of techniques is used to test models and theories, predict outcomes and
assess reliability and validity of scales.
TECHNIQUES COVERED IN PART FOUR
There is a range of techniques available in SPSS to explore relationships. These vary
according to the type of research question that needs to be addressed and the type of
data available. In this book, however, only the most commonly used techniques are
covered.
Correlation (Chapter 11) is used when you wish to describe the strength and
direction of the relationship between two variables (usually continuous). It can also
be used when one of the variables is dichotomous—that is, it has only two values (e.g.
sex: males/females). The statistic obtained is Pearsons product-moment correlation
(r). The statistical signifi cance of r is also provided.
Partial correlation (Chapter 12) is used when you wish to explore the relationship
between two variables while statistically controlling for a third variable. This is useful
when you suspect that the relationship between your two variables of interest may
be infl uenced, or confounded, by the impact of a third variable. Partial correlation
statistically removes the infl uence of the third variable, giving a cleaner picture of the
actual relationship between your two variables.
Multiple regression (Chapter 13) allows prediction of a single dependent continu-
ous variable from a group of independent variables. It can be used to test the predictive
power of a set of variables and to assess the relative contribution of each individual
variable.
Logistic regression (Chapter 14) is used instead of multiple regression when your
dependent variable is categorical. It can be used to test the predictive power of a set of
variables and to assess the relative contribution of each individual variable.
Factor analysis (Chapter 15) is used when you have a large number of related
variables (e.g. the items that make up a scale) and you wish to explore the underly-
ing structure of this set of variables. It is useful in reducing a large number of related
variables to a smaller, more manageable, number of dimensions or components. In
the remainder of this introduction to Part Four I will review some of the basic prin-
ciples of correlation that are common to all the techniques covered in Part Four. This
material should be reviewed before you attempt to use any of the procedures covered
in this section.
Statistical techniques to explore relationships among variables 123
REVISION OF THE BASICS
Correlation coeffi cients (e.g. Pearson product-moment correlation) provide a nu-
merical summary of the direction and the strength of the linear relationship between
two variables. Pearson correlation coeffi cients (r) can range from –1 to +1. The sign
in front indicates whether there is a positive correlation (as one variable increases,
so too does the other) or a negative correlation (as one variable increases, the other
decreases). The size of the absolute value (ignoring the sign) provides information
on the strength of the relationship. A perfect correlation of 1 or –1 indicates that the
value of one variable can be determined exactly by knowing the value on the other
variable. On the other hand, a correlation of 0 indicates no relationship between the
two variables. Knowing the value of one of the variables provides no assistance in
predicting the value of the second variable.
The relationship between variables can be inspected visually by generating
a scatterplot. This is a plot of each pair of scores obtained from the participants
in the sample. Scores on the fi rst variable are plotted along the X (horizontal) axis and
the corresponding scores on the second variable are plotted on the Y (vertical) axis.
An inspection of the scatterplot provides information on both the direction of the
relationship (positive or negative) and the strength of the relationship (this is demon-
strated in more detail in Chapter 11). A scatterplot of a perfect correlation (r=1 or –1)
would show a straight line. A scatterplot when r=0, however, would show a circle or
blob of points, with no pattern evident.
Factors to consider when interpreting a correlation
coeffi cient
There are a number of things you need to be careful of when interpreting the results
of a correlation analysis, or other techniques based on correlation. Some of the key
issues are outlined below, but I would suggest you go back to your statistics books and
review this material (see, for example, Gravetter & Wallnau 2004, pp. 520–76).
Non-linear relationship
The correlation coeffi cient (e.g. Pearson r) provides an indication of the linear
(straight-line) relationship between variables. In situations where the two variables
are related in non-linear fashion (e.g. curvilinear), Pearson r will seriously underesti-
mate the strength of the relationship. Always check the scatterplot, particularly if you
obtain low values of r.
Outliers
Outliers (values that are substantially lower or higher than the other values in the data set)
can have a dramatic effect on the correlation coeffi cient, particularly in small samples.
124 Statistical techniques to explore relationships among variables
In some circumstances outliers can make the r value much higher than it should be, and
in other circumstances they can result in an underestimate of the true relationship. A
scatterplot can be used to check for outliers—just look for values that are sitting out on
their own. These could be due to a data entry error (typing 11, instead of 1), a careless
answer from a respondent, or it could be a true value from a rather strange individual!
If you fi nd an outlier, you should check for errors and correct if appropriate. You may
also need to consider removing or recoding the offending value to reduce the effect it is
having on the r value (see Chapter 6 for a discussion on outliers).
Restricted range of scores
You should always be careful interpreting correlation coeffi cients when they come
from only a small subsection of the possible range of scores (e.g. using university
students to study IQ). Correlation coeffi cients from studies using a restricted range of
cases are often different from studies where the full range of possible scores is sampled.
In order to provide an accurate and reliable indicator of the strength of the relation-
ship between two variables, there should be as wide a range of scores on each of the
two variables as possible. If you are involved in studying extreme groups (e.g. clients
with high levels of anxiety), you should not try to generalise any correlation beyond
the range of the variable used in the sample.
Correlation versus causality
Correlation provides an indication that there is a relationship between two vari-
ables; it does not, however, indicate that one variable causes the other. The correlation
between two variables (A and B) could be due to the fact that A causes B, that B causes
A, or (just to complicate matters) that an additional variable (C) causes both A and
B. The possibility of a third variable that infl uences both of your observed variables
should always be considered. To illustrate this point, there is the famous story of the
strong correlation that one researcher found between ice-cream consumption and the
number of homicides reported in New York City. Does eating ice-cream cause people
to become violent? No. Both variables (ice-cream consumption and crime rate) were
infl uenced by the weather. During the very hot spells, both the ice-cream consump-
tion and the crime rate increased. Despite the positive correlation obtained, this did
not prove that eating ice-cream causes homicidal behaviour. Just as well—the ice-
cream manufacturers would very quickly be out of business!
The warning here is clear—watch out for the possible infl uence of a third,
confounding variable when designing your own study. If you suspect the possibility
of other variables that might infl uence your result, see if you can measure these at the
same time. By using partial correlation (described in Chapter 12) you can statistically
control for these additional variables, and therefore gain a clearer, and less contami-
nated, indication of the relationship between your two variables of interest.
Statistical techniques to explore relationships among variables 125
Statistical versus practical signifi cance
Don’t get too excited if your correlation coeffi cients are ‘signifi cant’. With large samples,
even quite small correlation coeffi cients (e.g. r=.2) can reach statistical signifi cance.
Although statistically signifi cant, the practical signifi cance of a correlation of .2 is very
limited. You should focus on the actual size of Pearsons r and the amount of shared
variance between the two variables. The amount of shared variance can be calculated by
squaring the value of the correlation coeffi cient (e.g. .2 X .2 =.04 = 4% shared variance).
To interpret the strength of your correlation coeffi cient, you should also take into
account other research that has been conducted in your particular topic area. If other
researchers in your area have been able to predict only 9 per cent of the variance
(r=.3) in a particular outcome (e.g. anxiety), then your study that explains 25 per cent
(r=.5) would be impressive in comparison. In other topic areas, 25 per cent of the
variance explained may seem small and irrelevant.
Assumptions
There are a number of assumptions common to all the techniques covered in
Part Four. These are discussed below. You will need to refer back to these assumptions
when performing any of the analyses covered in Chapters 11, 12, 13, 14 and 15.
Level of measurement
The scale of measurement for the variables for most of the techniques covered in
Part Four should be interval or ratio (continuous). One exception to this is if you
have one dichotomous independent variable (with only two values e.g. sex) and one
continuous dependent variable. You should, however, have roughly the same number
of people or cases in each category of the dichotomous variable.
Spearmans rho, which is a correlation coeffi cient suitable for ordinal or ranked
data, is included in Chapter 11, along with the parametric alternative Pearson corre-
lation coeffi cient. Rho is commonly used in the health and medical literature, and is
also increasingly being used in psychology research as researchers become more aware
of the potential problems of assuming that ordinal level ratings (e.g. Likert scales)
approximate interval level scaling.
Related pairs
Each subject must provide a score on both variable X and variable Y (related pairs).
Both pieces of information must be from the same subject.
Independence of observations
The observations that make up your data must be independent of one another. That
is, each observation or measurement must not be infl uenced by any other observation
or measurement. Violation of this assumption, according to Stevens (1996, p. 238), is
126 Statistical techniques to explore relationships among variables
very serious. There are a number of research situations that may violate this assump-
tion of independence. Examples of some such studies are described below (these are
drawn from Stevens 1996, p. 239; and Gravetter & Wallnau 2004, p. 251):
Studying the performance of students working in pairs or small groups. The
behaviour of each member of the group infl uences all other group members,
thereby violating the assumption of independence.
Studying the TV-watching habits and preferences of children drawn from the same
family. The behaviour of one child in the family (e.g. watching Program A) is likely
to affect all children in that family; therefore the observations are not independent.
Studying teaching methods within a classroom and examining the impact
on students’ behaviour and performance. In this situation, all students could
be infl uenced by the presence of a small number of trouble-makers; therefore
individual behavioural or performance measurements are not independent.
Any situation where the observations or measurements are collected in a group
setting, or participants are involved in some form of interaction with one another,
should be considered suspect. In designing your study, you should try to ensure that all
observations are independent. If you suspect some violation of this assumption, Stevens
(1996, p. 241) recommends that you set a more stringent alpha value (e.g. p<.01).
There are more complex statistical techniques that can be used for data that
involve non-independent samples (e.g. children within different classrooms, within
different schools). This approach involves multilevel modelling, which is beyond the
scope of this book.
Normality
Scores on each variable should be normally distributed. This can be checked by
inspecting the histograms of scores on each variable (see Chapter 6 for instructions).
Linearity
The relationship between the two variables should be linear. This means that when you
look at a scatterplot of scores you should see a straight line (roughly), not a curve.
Homoscedasticity
The variability in scores for variable X should be similar at all values of variable Y.
Check the scatterplot (see Chapter 6 for instructions). It should show a fairly even
cigar shape along its length.
Missing data
When you are doing research, particularly with human beings, it is very rare that you
will obtain complete data from every case. It is thus important that you inspect your
Statistical techniques to explore relationships among variables 127
data fi le for missing data. Run Descriptives and fi nd out what percentage of values is
missing for each of your variables. If you fi nd a variable with a lot of unexpected missing
data, you need to ask yourself why. You should also consider whether your missing
values are happening randomly, or whether there is some systematic pattern (e.g. lots
of women failing to answer the question about their age). SPSS has a Missing Value
Analysis procedure that may help fi nd patterns in your missing values.
You also need to consider how you will deal with missing values when you come
to do your statistical analyses. The Options button in many of the SPSS statistical
procedures offers you choices for how you want SPSS to deal with missing data. It is
important that you choose carefully, as it can have dramatic effects on your results.
This is particularly important if you are including a list of variables and repeating the
same analysis for all variables (e.g. correlations among a group of variables, t-tests for
a series of dependent variables).
• The Exclude cases listwise option will include cases in the analysis only if it has
full data on all of the variables listed in your variables box for that case. A case will
be totally excluded from all the analyses if it is missing even one piece of infor-
mation. This can severely, and unnecessarily, limit your sample size.
• The Exclude cases pairwise option, however, excludes the cases (persons) only
if they are missing the data required for the specifi c analysis. They will still be
included in any of the analyses for which they have the necessary information.
• The Replace with mean option, which is available in some SPSS statistical proce-
dures (e.g. multiple regression), calculates the mean value for the variable and
gives every missing case this value. This option should never be used as it can
severely distort the results of your analysis, particularly if you have a lot of missing
values.
Always press the Options button for any statistical procedure you conduct and
check which of these options is ticked (the default option varies across procedures).
I would strongly recommend that you use pairwise exclusion of missing data, unless
you have a pressing reason to do otherwise. The only situation where you might
need to use listwise exclusion is when you want to refer only to a subset of cases that
provided a full set of results.
Strange-looking numbers
In your output, you may come across some strange-looking numbers that take the
form 1.24E-02. These small values are presented in scientifi c notation. To prevent
this happening, choose Edit from the main menu bar, select Options, and make sure
there is a tick in the box No scientifi c notation for small numbers in tables on the
General tab.
128
11
Correlation
Correlation analysis is used to describe the strength and direction of the linear re-
lationship between two variables. There are a number of different statistics available
from SPSS, depending on the level of measurement and the nature of your data. In
this chapter, the procedure for obtaining and interpreting a Pearson product-moment
correlation coeffi cient (r) is presented along with Spearman Rank Order Correlation
(rho). Pearson r is designed for interval level (continuous) variables. It can also be
used if you have one continuous variable (e.g. scores on a measure of self-esteem)
and one dichotomous variable (e.g. sex: M/F). Spearman rho is designed for use with
ordinal level or ranked data and is particularly useful when your data does not meet
the criteria for Pearson correlation.
SPSS will calculate two types of correlation for you. First, it will give you a simple
bivariate correlation (which just means between two variables), also known as zero-
order correlation. SPSS will also allow you to explore the relationship between two
variables while controlling for another variable. This is known as partial correlation.
In this chapter, the procedure to obtain a bivariate Pearson r and non-parametric
Spearman rho is presented. Partial correlation is covered in Chapter 12.
Pearson correlation coeffi cients (r) can only take on values from –1 to +1. The
sign out the front indicates whether there is a positive correlation (as one variable
increases, so too does the other) or a negative correlation (as one variable increases,
the other decreases). The size of the absolute value (ignoring the sign) provides an
indication of the strength of the relationship. A perfect correlation of 1 or –1 indi-
cates that the value of one variable can be determined exactly by knowing the value
on the other variable. A scatterplot of this relationship would show a straight line. On
the other hand, a correlation of 0 indicates no relationship between the two variables.
Knowing the value on one of the variables provides no assistance in predicting the
value on the second variable. A scatterplot would show a circle of points, with no
pattern evident.
Correlation 129
There are a number of issues associated with the use of correlation. These include
the effect of non-linear relationships, outliers, restriction of range, correlation versus
causality and statistical versus practical signifi cance. These topics are discussed in the
introduction to Part Four of this book. I would strongly recommend that you read
through