IBM SPSS Statistics 24 Core System User's Guide IBM_SPSS_Statistics_Core_System_User_Guide User

User Manual: IBM_SPSS_Statistics_Core_System_User_Guide user guide pdf - FTP File Search (14/20)

Open the PDF directly: View PDF PDF.
Page Count: 310 [warning: Documents this large are best viewed by clicking the View PDF Link!]

IBM SPSS Statistics 24 Core System
User's Guide
IBM
Note
Before using this information and the product it supports, read the information in “Notices” on page 289.
Product Information
This edition applies to version 24, release 0, modification 0 of IBM SPSS Statistics and to all subsequent releases and
modifications until otherwise indicated in new editions.
Contents
Chapter 1. Overview ......... 1
Windows ............... 1
Designated window versus active window ... 1
Variable names and variable labels in dialog box lists 2
Data type, measurement level, and variable list icons 2
Statistics Coach ............. 2
Finding out more ............. 3
Chapter 2. Getting Help ........ 5
Chapter 3. Data files ......... 7
Opening data files ............ 7
To open data files............ 7
Data file types ............. 7
Reading Excel Files ........... 8
Reading Older Excel Files and Other Spreadsheets 9
Reading dBASE files........... 9
Reading Stata files ........... 9
Reading CSV Files ........... 10
Text Wizard ............. 10
Reading Database Files ......... 14
Reading Cognos BI data ......... 20
Reading Cognos TM1 data ........ 22
Reading Data Collection Data ....... 23
File information ............. 25
Saving data files ............. 25
To save modified data files ........ 25
To save data files in code page character
encoding .............. 25
Saving data files in external formats ..... 26
Saving data files in Excel format ...... 29
Saving data files in SAS format....... 29
Saving data files in Stata format ...... 30
Saving Subsets of Variables ........ 31
Encrypting data files .......... 31
Exporting to a Database ......... 32
Exporting to Data Collection........ 38
Exporting to Cognos TM1 ........ 38
Comparing datasets ........... 40
Compare Datasets: Compare tab ...... 40
Compare Datasets: Attributes tab ...... 41
Comparing datasets: Output tab ...... 41
Protecting original data .......... 42
Virtual Active File ............ 42
Creating a Data Cache.......... 43
Chapter 4. Distributed Analysis Mode 45
Server Login .............. 45
Adding and Editing Server Login Settings ... 45
To Select, Switch, or Add Servers ...... 46
Searching for Available Servers ....... 47
Opening Data Files from a Remote Server .... 47
File Access in Local and Distributed Analysis Mode 47
Availability of Procedures in Distributed Analysis
Mode ................ 48
Absolute versus Relative Path Specifications ... 48
Chapter 5. Data Editor ........ 51
Data View ............... 51
Variable View.............. 51
To display or define variable attributes .... 52
Variable names ............ 52
Variable measurement level ........ 53
Variable type ............. 53
Variable labels ............ 55
Value labels ............. 55
Inserting line breaks in labels ....... 55
Missing values ............ 55
Roles ............... 56
Column width ............ 56
Variable alignment ........... 56
Applying variable definition attributes to
multiple variables ........... 56
Custom Variable Attributes ........ 57
Customizing Variable View ........ 59
Spell checking ............ 60
Entering data .............. 60
To enter numeric data .......... 60
To enter non-numeric data ........ 61
To use value labels for data entry ...... 61
Data value restrictions in the data editor ... 61
Editing data .............. 61
Replacing or modifying data values ..... 61
Cutting, copying, and pasting data values ... 61
Inserting new cases........... 62
Inserting new variables ......... 62
To change data type .......... 63
Finding cases, variables, or imputations ..... 63
Finding and replacing data and attribute values .. 64
Obtaining Descriptive Statistics for Selected
Variables ............... 64
Case selection status in the Data Editor ..... 65
Data Editor display options ......... 65
Data Editor printing ........... 65
To print Data Editor contents ....... 66
Chapter 6. Working with Multiple Data
Sources .............. 67
Basic Handling of Multiple Data Sources .... 67
Working with Multiple Datasets in Command
Syntax ................ 67
Copying and Pasting Information between Datasets 67
Renaming Datasets ............ 68
Suppressing Multiple Datasets ........ 68
Chapter 7. Data preparation...... 69
Variable properties ............ 69
Defining Variable Properties ......... 69
To Define Variable Properties ....... 70
iii
Defining Value Labels and Other Variable
Properties .............. 70
Assigning the Measurement Level ...... 71
Custom Variable Attributes ........ 72
Copying Variable Properties ........ 72
Setting measurement level for variables with
unknown measurement level ........ 72
Multiple Response Sets .......... 73
Defining Multiple Response Sets ...... 73
Copy Data Properties ........... 75
Copying Data Properties ......... 75
Identifying Duplicate Cases ......... 78
Visual Binning ............. 79
To Bin Variables ............ 79
Binning Variables ........... 80
Automatically Generating Binned Categories .. 81
Copying Binned Categories ........ 82
User-Missing Values in Visual Binning .... 83
Chapter 8. Data Transformations ... 85
Data Transformations ........... 85
Computing Variables ........... 85
Compute Variable: If Cases ........ 85
Compute Variable: Type and Label ..... 86
Functions ............... 86
Missing Values in Functions ......... 86
Random Number Generators ........ 87
Count Occurrences of Values within Cases .... 87
Count Values within Cases: Values to Count .. 87
Count Occurrences: If Cases ........ 87
Shift Values .............. 88
Recoding Values ............. 88
Recode into Same Variables ......... 88
Recode into Same Variables: Old and New Values 89
Recode into Different Variables ........ 90
Recode into Different Variables: Old and New
Values ............... 90
Automatic Recode ............ 91
Rank Cases .............. 92
Rank Cases: Types ........... 93
Rank Cases: Ties............ 93
Date and Time Wizard........... 94
Dates and Times in IBM SPSS Statistics .... 94
Create a Date/Time Variable from a String ... 95
Create a Date/Time Variable from a Set of
Variables .............. 95
Add or Subtract Values from Date/Time
Variables .............. 96
Extract Part of a Date/Time Variable ..... 98
Time Series Data Transformations ....... 98
Define Dates ............. 98
Create Time Series ........... 99
Replace Missing Values ......... 101
Chapter 9. File handling and file
transformations .......... 103
File handling and file transformations ..... 103
Sort cases............... 103
Sort variables ............. 104
Transpose .............. 105
Merging Data Files ........... 105
Add Cases ............. 105
Add Variables ............ 106
Aggregate Data............. 108
Aggregate Data: Aggregate Function .... 109
Aggregate Data: Variable Name and Label... 109
Split file ............... 109
Select cases .............. 110
Select cases: If ............ 111
Select cases: Random sample ....... 111
Select cases: Range........... 111
Weight cases.............. 111
Restructuring Data ........... 112
To Restructure Data .......... 112
Restructure Data Wizard: Select Type .... 112
Restructure Data Wizard (Variables to Cases):
Number of Variable Groups ....... 115
Restructure Data Wizard (Variables to Cases):
Select Variables ............ 115
Restructure Data Wizard (Variables to Cases):
Create Index Variables ......... 116
Restructure Data Wizard (Variables to Cases):
Create One Index Variable ........ 117
Restructure Data Wizard (Variables to Cases):
Create Multiple Index Variables ...... 118
Restructure Data Wizard (Variables to Cases):
Options .............. 118
Restructure Data Wizard (Cases to Variables):
Select Variables ............ 119
Restructure Data Wizard (Cases to Variables):
Sort Data .............. 119
Restructure Data Wizard (Cases to Variables):
Options .............. 119
Restructure Data Wizard: Finish ...... 120
Chapter 10. Working with output ... 123
Working with output ........... 123
Viewer ............... 123
Showing and hiding results ....... 123
Moving, deleting, and copying output .... 123
Changing initial alignment ........ 124
Changing alignment of output items .... 124
Viewer outline ............ 124
Adding items to the Viewer ....... 125
Finding and replacing information in the Viewer 126
Closing output items .......... 127
Copying output into other applications..... 127
Interactive output ............ 128
Export output ............. 128
HTML options ............ 129
Web report options .......... 130
Word/RTF options .......... 131
Excel options ............ 131
PowerPoint options .......... 132
PDF options ............. 132
Text options ............. 133
Graphics only options ......... 134
Graphics format options......... 134
Viewer printing ............ 135
To print output and charts ........ 135
Print Preview ............ 135
iv IBM SPSS Statistics 24 Core System User's Guide
Page Attributes: Headers and Footers .... 135
Page Attributes: Options......... 136
Saving output ............. 136
To save a Viewer document ....... 137
Chapter 11. Pivot tables ....... 139
Pivot tables .............. 139
Manipulating a pivot table ......... 139
Activating a pivot table ......... 139
Pivoting a table............ 139
Changing display order of elements within a
dimension ............. 139
Moving rows and columns within a dimension
element .............. 139
Transposing rows and columns ...... 140
Grouping rows or columns ........ 140
Ungrouping rows or columns ....... 140
Rotating row or column labels....... 140
Sorting rows............. 140
Inserting rows and columns ....... 141
Controlling display of variable and value labels 141
Changing the output language ...... 141
Navigating large tables ......... 142
Undoing changes ........... 142
Working with layers ........... 142
Creating and displaying layers ...... 142
Go to layer category .......... 143
Showing and hiding items ......... 143
Hiding rows and columns in a table..... 143
Showing hidden rows and columns in a table 143
Hiding and showing dimension labels .... 143
Hiding and showing table titles ...... 143
TableLooks .............. 143
To apply a TableLook.......... 144
To edit or create a TableLook ....... 144
Table properties ............ 144
To change pivot table properties ...... 144
Table properties: general......... 144
Table properties: notes ......... 145
Table properties: cell formats ....... 146
Table properties: borders ........ 146
Table properties: printing ........ 146
Cell properties ............. 147
Font and background.......... 147
Format value ............ 147
Alignment and margins ......... 147
Footnotes and captions .......... 147
Adding footnotes and captions ...... 147
To hide or show a caption ........ 148
To hide or show a footnote in a table .... 148
Footnote marker ........... 148
Renumbering footnotes ......... 149
Editing footnotes in legacy tables...... 149
Data cell widths ............ 149
Changing column width.......... 150
Displaying hidden borders in a pivot table ... 150
Selecting rows, columns and cells in a pivot table 150
Printing pivot tables ........... 150
Controlling table breaks for wide and long
tables ............... 150
Creating a chart from a pivot table ...... 151
Legacy tables ............. 151
Chapter 12. Models ......... 153
Interacting with a model ......... 153
Working with the Model Viewer ...... 153
Printing a model ............ 154
Exporting a model............ 155
Saving fields used in the model to a new dataset 155
Saving predictors to a new dataset based on
importance .............. 155
Ensemble Viewer ............ 155
Models for Ensembles ......... 155
Split Model Viewer ........... 157
Chapter 13. Automated Output
Modification ............ 159
Style Output: Select ........... 159
Style Output ............. 160
Style Output: Labels and Text ....... 162
Style Output: Indexing ......... 162
Style Output: TableLooks ........ 163
Style Output: Size ........... 163
Table Style .............. 163
Table Style: Condition ......... 164
Table Style: Format .......... 164
Chapter 14. Working with Command
Syntax .............. 167
Syntax Rules ............. 167
Pasting Syntax from Dialog Boxes ...... 168
To Paste Syntax from Dialog Boxes ..... 168
Copying Syntax from the Output Log ..... 168
To Copy Syntax from the Output Log .... 169
Using the Syntax Editor .......... 169
Syntax Editor Window ......... 169
Terminology............. 170
Auto-Completion ........... 171
Color Coding ............ 171
Breakpoints ............. 172
Bookmarks ............. 173
Commenting or Uncommenting Text .... 174
Formatting Syntax........... 174
Running Command Syntax ........ 175
Character Set Encoding in Syntax Files .... 176
Multiple Execute Commands ....... 176
Character Set Encoding in Syntax Files ..... 177
Multiple Execute Commands ........ 177
Encrypting syntax files .......... 178
Chapter 15. Overview of the chart
facility .............. 181
Building and editing a chart ........ 181
Building Charts ........... 181
Editing Charts ............ 182
Chart definition options .......... 183
Adding and Editing Titles and Footnotes ... 183
Setting General Options ......... 184
Contents v
Chapter 16. Scoring data with
predictive models ......... 185
Scoring Wizard ............. 185
Matching model fields to dataset fields .... 186
Selecting scoring functions ........ 187
Scoring the active dataset ........ 188
Merging model and transformation XML files .. 188
Chapter 17. Utilities......... 191
Utilities ............... 191
Variable information ........... 191
Data file comments ........... 191
Variable sets .............. 192
Defining variable sets .......... 192
Using variable sets to show and hide variables .. 192
Reordering target variable lists ....... 193
Chapter 18. Options ........ 195
Options ............... 195
General options ............ 195
Viewer Options............. 196
Data Options ............. 197
Changing the default variable view ..... 199
Language options ............ 199
Currency options ............ 200
To create custom currency formats ..... 200
Output options ............. 200
Chart options ............. 201
Data Element Colors .......... 201
Data Element Lines .......... 201
Data Element Markers ......... 202
Data Element Fills ........... 202
Pivot table options ........... 203
File locations options ........... 204
Script options ............. 205
Syntax editor options........... 205
Multiple imputations options ........ 206
Chapter 19. Customizing Menus and
Toolbars ............. 209
Customizing Menus and Toolbars ...... 209
Menu Editor.............. 209
Customizing Toolbars .......... 209
Show Toolbars ............. 209
To Customize Toolbars .......... 209
Toolbar Properties ........... 210
Edit Toolbar ............. 210
Create New Tool ........... 210
Chapter 20. Extensions ....... 211
Extension Hub ............. 211
Explore tab ............. 212
Installed tab ............. 212
Settings .............. 213
Extension Details ........... 213
Installing local extension bundles....... 214
Installation locations for extensions ..... 214
Required R packages .......... 215
Batch installation of extension bundles .... 216
Creating and managing custom dialogs..... 216
Custom Dialog Builder layout ....... 217
Building a custom dialog ........ 217
Dialog Properties ........... 218
Specifying the Menu Location for a Custom
Dialog............... 219
Laying out controls on the canvas ..... 220
Building the Syntax Template ....... 220
Previewing a custom dialog ....... 222
Control types ............ 222
Extension Properties .......... 243
Managing custom dialogs ........ 246
Custom Dialogs for Extension Commands... 250
Creating Localized Versions of Custom Dialogs 251
Creating and editing extension bundles..... 252
Chapter 21. Production jobs ..... 255
Syntax files .............. 256
Output ............... 256
HTML options ............ 257
PowerPoint options .......... 257
PDF options ............. 257
Text options ............. 258
Production jobs with OUTPUT commands .. 258
Runtime values............. 258
Run options .............. 259
Server login .............. 259
Adding and Editing Server Login Settings... 259
User prompts ............. 260
Background job status .......... 260
Running production jobs from a command line .. 260
Converting Production Facility files ...... 261
Chapter 22. Output Management
System .............. 263
Output object types ........... 264
Command identifiers and table subtypes .... 265
Labels ................ 266
OMS options ............. 266
Logging ............... 268
Excluding output display from the viewer.... 269
Routing output to IBM SPSS Statistics data files 269
Data files created from multiple tables .... 269
Controlling column elements to control
variables in the data file ......... 270
Variable names in OMS-generated data files .. 270
OXML table structure .......... 270
OMS identifiers ............ 273
Copying OMS identifiers from the viewer
outline .............. 273
Chapter 23. Scripting Facility .... 275
Autoscripts .............. 276
Creating Autoscripts .......... 276
Associating Existing Scripts with Viewer Objects 277
Scripting with the Python Programming Language 277
Running Python Scripts and Python programs 278
Script Editor for the Python Programming
Language.............. 279
Scripting in Basic ............ 279
vi IBM SPSS Statistics 24 Core System User's Guide
Compatibility with Versions Prior to 16.0 ... 279
The scriptContext Object ........ 281
Startup Scripts ............. 282
Chapter 24. TABLES and IGRAPH
Command Syntax Converter ..... 285
Chapter 25. Encrypting data files,
output documents, and syntax files.. 287
Notices .............. 289
Trademarks .............. 291
Index ............... 293
Contents vii
viii IBM SPSS Statistics 24 Core System User's Guide
Chapter 1. Overview
Windows
There are a number of different types of windows in IBM®SPSS®Statistics:
Data Editor. The Data Editor displays the contents of the data file. You can create new data files or
modify existing data files with the Data Editor. If you have more than one data file open, there is a
separate Data Editor window for each data file.
Viewer. All statistical results, tables, and charts are displayed in the Viewer. You can edit the output and
save it for later use. A Viewer window opens automatically the first time you run a procedure that
generates output.
Pivot Table Editor. Output that is displayed in pivot tables can be modified in many ways with the Pivot
Table Editor. You can edit text, swap data in rows and columns, add color, create multidimensional tables,
and selectively hide and show results.
Chart Editor. You can modify high-resolution charts and plots in chart windows. You can change the
colors, select different type fonts or sizes, switch the horizontal and vertical axes, rotate 3-D scatterplots,
and even change the chart type.
Text Output Editor. Text output that is not displayed in pivot tables can be modified with the Text
Output Editor. You can edit the output and change font characteristics (type, style, color, size).
Syntax Editor. You can paste your dialog box choices into a syntax window, where your selections appear
in the form of command syntax. You can then edit the command syntax to use special features that are
not available through dialog boxes. You can save these commands in a file for use in subsequent sessions.
Designated window versus active window
If you have more than one open Viewer window, output is routed to the designated Viewer window. If
you have more than one open Syntax Editor window, command syntax is pasted into the designated
Syntax Editor window. The designated windows are indicated by a plus sign in the icon in the title bar.
You can change the designated windows at any time.
The designated window should not be confused with the active window, which is the currently selected
window. If you have overlapping windows, the active window appears in the foreground. If you open a
window, that window automatically becomes the active window and the designated window.
Changing the designated window
1. Make the window that you want to designate the active window (click anywhere in the window).
2. Click the Designate Window button on the toolbar (the plus sign icon).
or
3. From the menus choose:
Utilities > Designate Window
Note: For Data Editor windows, the active Data Editor window determines the dataset that is used in
subsequent calculations or analyses. There is no "designated" Data Editor window. See the topic “Basic
Handling of Multiple Data Sources” on page 67 for more information.
© Copyright IBM Corporation 1989, 2016 1
Variable names and variable labels in dialog box lists
You can display either variable names or variable labels in dialog box lists, and you can control the sort
order of variables in source variable lists. To control the default display attributes of variables in source
lists, choose Options on the Edit menu. See the topic “General options” on page 195 for more
information.
You can also change the variable list display attributes within dialogs. The method for changing the
display attributes depends on the dialog:
vIf the dialog provides sorting and display controls above the source variable list, use those controls to
change the display attributes.
vIf the dialog does not contain sorting controls above the source variable list, right-click any variable in
the source list and select the display attributes from the pop-up menu.
You can display either variable names or variable labels (names are displayed for any variables without
defined labels), and you can sort the source list by file order, alphabetical order, or measurement level. (In
dialogs with sorting controls above the source variable list, the default selection of None sorts the list in
file order.)
Data type, measurement level, and variable list icons
The icons that are displayed next to variables in dialog box lists provide information about the variable
type and measurement level.
Table 1. Measurement level icons
Numeric String Date Time
Scale (Continuous) n/a
Ordinal
Nominal
vFor more information on measurement level, see “Variable measurement level” on page 53.
vFor more information on numeric, string, date, and time data types, see “Variable type” on page 53.
Statistics Coach
If you are unfamiliar with IBM SPSS Statistics or with the available statistical procedures, the Statistics
Coach can help you get started by prompting you with simple questions, nontechnical language, and
visual examples that help you select the basic statistical and charting features that are best suited for your
data.
To use the Statistics Coach, from the menus in any IBM SPSS Statistics window choose:
Help > Statistics Coach
The Statistics Coach covers only a selected subset of procedures. It is designed to provide general
assistance for many of the basic, commonly used statistical techniques.
2IBM SPSS Statistics 24 Core System User's Guide
Finding out more
For a comprehensive overview of the basics, see the online tutorial. From any IBM SPSS Statistics menu
choose:
Help > Tutorial
Chapter 1. Overview 3
4IBM SPSS Statistics 24 Core System User's Guide
Chapter 2. Getting Help
The Help system contains a number of different sections.
Help Information on the user interface. There is a separate section for each optional module.
Reference
Reference information for the command language, GPL, VizML, and schemas. The reference
material for the command language is also available in PDF form: Help>Command Syntax
Reference.
Tutorial
Step-by-step instructions on how to use many of the basic features.
Case Studies
Hands-on examples of how to create various types of statistical analyses and how to interpret the
results.
Statistics Coach
Guides you through the process of finding the procedure that you want to use.
Integration Plug-ins
Separate sections for each programming plug-in, including Python, R, Java, and .Net
Context-sensitive help
In many places in the user interface, you can get context-sensitive help.
vHelp buttons in dialog boxes take you directly to the help topic for that dialog.
vRight-click on terms in an activated pivot table in the Viewer and choose What's This? from the
pop-up menu to display definitions of the terms.
vIn a command syntax window, position the cursor anywhere within a syntax block for a command and
press F1 on the keyboard. The help for that command is displayed
Other resources
Answers to many common problems can be found at http://www.ibm.com/support .
If you're a student using a student, academic or grad pack version of any IBM SPSS software product,
please see our special online Solutions for Education pages for students. If you're a student using a
university-supplied copy of the IBM SPSS software, please contact the IBM SPSS product coordinator at
your university.
The IBM SPSS Predictive Analytics community has resources for all levels of users and application
developers. Download utilities, graphics examples, new statistical modules, and articles. Visit the IBM
SPSS Predictive Analytics community at https://developer.ibm.com/predictiveanalytics/.
Documentation in PDF format is available at http://www.ibm.com/support/
docview.wss?uid=swg27047033.
Documentation of statistical algorithms is available at http://www.ibm.com/support/
docview.wss?uid=swg27047033.
© Copyright IBM Corporation 1989, 2016 5
6IBM SPSS Statistics 24 Core System User's Guide
Chapter 3. Data files
Data files come in a wide variety of formats, and this software is designed to handle many of them,
including:
vExcel spreadsheets
vDatabase tables from many database sources, including Oracle, SQLServer, DB2, and others
vTab-delimited, CSV, and other types of simple text files
vSAS data files
vStata data files
Opening data files
In addition to files saved in IBM SPSS Statistics format, you can open Excel, SAS, Stata, tab-delimited,
and other files without converting the files to an intermediate format or entering data definition
information.
vOpening a data file makes it the active dataset. If you already have one or more open data files, they
remain open and available for subsequent use in the session. Clicking anywhere in the Data Editor
window for an open data file will make it the active dataset. See the topic Chapter 6, “Working with
Multiple Data Sources,” on page 67 for more information.
vIn distributed analysis mode using a remote server to process commands and run procedures, the
available data files, folders, and drives are dependent on what is available on or from the remote
server. The current server name is indicated at the top of the dialog box. You will not have access to
data files on your local computer unless you specify the drive as a shared device and the folders
containing your data files as shared folders. See the topic Chapter 4, “Distributed Analysis Mode,” on
page 45 for more information.
To open data files
1. From the menus choose:
File > Open > Data...
2. In the Open Data dialog box, select the file that you want to open.
3. Click Open.
Optionally, you can:
vAutomatically set the width of each string variable to the longest observed value for that variable
using Minimize string widths based on observed values. This is particularly useful when reading
code page data files in Unicode mode. See the topic “General options” on page 195 for more
information.
vRead variable names from the first row of spreadsheet files.
vSpecify a range of cells to read from spreadsheet files.
vSpecify a worksheet within an Excel file to read (Excel 95 or later).
For information on reading data from databases, see “Reading Database Files” on page 14. For
information on reading data from text data files, see “Text Wizard” on page 10. For information on
reading IBM Cognos®data, see “Reading Cognos BI data” on page 20.
Data file types
SPSS Statistics. Opens data files that are saved in IBM SPSS Statistics format and also the DOS product
SPSS/PC+.
© Copyright IBM Corporation 1989, 2016 7
SPSS Statistics Compressed. Opens data files that are saved in IBM SPSS Statistics compressed format.
SPSS/PC+. Opens SPSS/PC+ data files. This option is available only on Windows operating systems.
Portable. Opens data files that are saved in portable format. Saving a file in portable format takes
considerably longer than saving the file in IBM SPSS Statistics format.
Excel. Opens Excel files.
Lotus 1-2-3. Opens data files that are saved in 1-2-3 format for release 3.0, 2.0, or 1A of Lotus.
SYLK. Opens data files that are saved in SYLK (symbolic link) format, a format that is used by some
spreadsheet applications.
dBASE. Opens dBASE-format files for either dBASE IV, dBASE III or III PLUS, or dBASE II. Each case is
a record. Variable and value labels and missing-value specifications are lost when you save a file in this
format.
SAS. SAS versions 6–9 and SAS transport files.
Stata. Stata versions 4–13.
Reading Excel Files
This topic applies to Excel 95 and later files. To read Excel 4 or earlier versions, see the topic “Reading
Older Excel Files and Other Spreadsheets” on page 9.
Worksheet
Excel files can contain multiple worksheets. By default, the Data Editor reads the first worksheet.
To read a different worksheet, select the worksheet from the list.
Range You can also read a range of cells. Use the same method for specifying cell ranges as you would
in Excel. For example: A1:D10.
Read variable names from first row of data
You can read variable names from the first row of the file or the first row of the defined range.
Values that don't conform to variable naming rules are converted to valid variable names, and the
original names are used as variable labels.
Percentage of values that determine data type
The data type for each variable is determined by the percentage of values that conform to the
same format.
vThe value must be greater than 50.
vThe denominator used to determine the percentage is the number of non-blank values for each
variable.
vIf no consistent format is used by the specified percentage of values, the variable is assigned
the string data type.
vFor variables that are assigned a numeric format (including date and time formats) based on
the percentage value, values that do not conform to that format are assigned the
system-missing value.
Ignore hidden rows and columns
Hidden rows and columns in the Excel file are not included. This option is available only for
Excel 2007 and later files (XLSX, XLSM).
Remove leading spaces from string values
Any blank spaces at the beginning of string values are removed.
8IBM SPSS Statistics 24 Core System User's Guide
Remove trailing spaces from string values
Blank spaces at the end of the string values are removed. This setting affects the calculation of the
defined width of string variables.
Reading Older Excel Files and Other Spreadsheets
This topic applies to reading Excel 4 or earlier files, Lotus 1-2-3 files and SYLK format spreadsheet files.
For information on reading Excel 95 or later files, see the topic “Reading Excel Files” on page 8.
Read variable names. For spreadsheets, you can read variable names from the first row of the file or the
first row of the defined range. The values are converted as necessary to create valid variable names,
including converting spaces to underscores.
Range. For spreadsheet data files, you can also read a range of cells. Use the same method for specifying
cell ranges as you would with the spreadsheet application.
How Spreadsheets are Read
vThe data type and width for each variable are determined by the column width and data type of the
first data cell in the column. Values of other types are converted to the system-missing value. If the
first data cell in the column is blank, the global default data type for the spreadsheet (usually numeric)
is used.
vFor numeric variables, blank cells are converted to the system-missing value, indicated by a period. For
string variables, a blank is a valid string value, and blank cells are treated as valid string values.
vIf you do not read variable names from the spreadsheet, the column letters (A, B, C, ...) are used for
variable names for Excel and Lotus files. For SYLK files and Excel files saved in R1C1 display format,
the software uses the column number preceded by the letter Cfor variable names (C1, C2, C3, ...).
Reading dBASE files
Database files are logically very similar to IBM SPSS Statistics data files. The following general rules
apply to dBASE files:
vField names are converted to valid variable names.
vColons used in dBASE field names are translated to underscores.
vRecords marked for deletion but not actually purged are included. The software creates a new string
variable, D_R, which contains an asterisk for cases marked for deletion.
Reading Stata files
The following general rules apply to Stata data files:
vVariable names. Stata variable names are converted to IBM SPSS Statistics variable names in
case-sensitive form. Stata variable names that are identical except for case are converted to valid
variable names by appending an underscore and a sequential letter (_A, _B, _C, ..., _Z, _AA, _AB, ...,
and so forth).
vVariable labels. Stata variable labels are converted to IBM SPSS Statistics variable labels.
vValue labels. Stata value labels are converted to IBM SPSS Statistics value labels, except for Stata
value labels assigned to "extended" missing values. Value labels longer than 120 bytes are truncated.
vString variables. Stata strl variables are converted to string variables. Values longer than 32K bytes are
truncated. Stata strl values that contain blobs (binary large objects) are converted to blank strings.
vMissing values. Stata "extended" missing values are converted to system-missing values.
vDate conversion. Stata date format values are converted to IBM SPSS Statistics DATE format (d-m-y)
values. Stata "time-series" date format values (weeks, months, quarters, and so on) are converted to
simple numeric (F) format, preserving the original, internal integer value, which is the number of
weeks, months, quarters, and so on, since the start of 1960.
Chapter 3. Data files 9
Reading CSV Files
To read CSV files, from the menus choose: File > Import Data > CSV
The Read CSV File dialog reads CSV format text data files that use a comma, a semicolon, or a tab as the
delimiter between values.
If the text file uses a different delimiter, contains text at the beginning of the file that is not variable
names or data values, or has other special considerations, use the Text Wizard to read the files.
First line contains variable names
The first non-blank line in the file contains label text that is used as variable names. Values that
are invalid as variable names are automatically converted to valid variable names.
Remove leading spaces from string values
Any blank spaces at the beginning of string values are removed.
Remove trailing spaces from string values
Blank spaces at the end of the string values are removed. This setting affects the calculation of the
defined width of string variables.
Delimiter between values
The delimiter can be a comma, a semicolon, or a tab. If the delimiter is any other character or a
blank space, use the Text Wizard to read the file.
Decimal symbol
The symbol that is used to indicate decimals in the text data file. The symbol can be a period or a
comma.
Text Qualifier
Character that is used to enclose values that contain the delimiter character. The qualifier appears
at the start and the end of the value. The qualifier can be double quotation mark, single quotation
mark, or none.
Percentage of values that determine data type
The data type for each variable is determined by the percentage of values that conform to the
same format.
vThe value must be greater than 50.
vIf no consistent format is used by the specified percentage of values, the variable is assigned
the string data type.
vFor variables that are assigned a numeric format (including date and time formats) based on
the percentage value, values that do not conform to that format are assigned the
system-missing value.
Cache data locally
A data cache is a complete copy of the data file that is stored in temporary disk space. Caching
the data file can improve performance.
Text Wizard
The Text Wizard can read text data files formatted in a variety of ways:
vTab-delimited files
vSpace-delimited files
vComma-delimited files
vFixed-field format files
For delimited files, you can also specify other characters as delimiters between values, and you can
specify multiple delimiters.
10 IBM SPSS Statistics 24 Core System User's Guide
To Read Text Data Files
1. From the menus choose:
File > Import Data > Text Data...
2. Select the text file in the Open Data dialog box.
3. If necessary, select the encoding of the file.
4. Follow the steps in the Text Wizard to define how to read the data file.
Encoding
The encoding of a file affects the way character data are read. Unicode data files typically contain a byte
order mark that identifies the character encoding. Some applications create Unicode files without a byte
order mark, and code page data files do not contain any encoding identifier.
vUnicode (UTF-8). Reads the file as Unicode UTF-8.
vUnicode (UTF-16). Reads the file as Unicode UTF-16 in the endianness of the operating system.
vUnicode (UTF-16BE). Reads the file as Unicode UTF-16, big endian.
vUnicode (UTF-16LE). Reads the file as Unicode UTF-16, little endian.
vLocal Encoding. Reads the file in current locale code page character encoding.
If a file contains a Unicode byte order mark, it is read in that Unicode encoding, regardless of the
encoding you select. If a file does not contain a Unicode byte order mark, by default the encoding is
assumed to be the current locale code page character encoding, unless you select one of the Unicode
encodings.
To change the current locale for data files in a different code page character encoding, select Edit>Options
from the menus, and change the locale on the Language tab.
Text Wizard: Step 1
The text file is displayed in a preview window. You can apply a predefined format (previously saved
from the Text Wizard) or follow the steps in the Text Wizard to specify how the data should be read.
Text Wizard: Step 2
This step provides information about variables. A variable is similar to a field in a database. For example,
each item in a questionnaire is a variable.
How are your variables arranged?
The arrangement of variables defines the method that is used to differentiate one variable from
the next.
Delimited
Spaces, commas, tabs, or other characters are used to separate variables. The variables are
recorded in the same order for each case but not necessarily in the same column
locations.
Fixed width
Each variable is recorded in the same column location on the same record (line) for each
case in the data file. No delimiter is required between variables. The column location
determines which variable is being read.
Note: The Text Wizard cannot read fixed-width Unicode text files. You can use the DATA
LIST command to read fixed-width Unicode files.
Are variable names included at the top of your file?
The values on the specified line number are used to create variable names. Values that don't
conform to variable naming rules are converted to valid variable names.
What is the decimal symbol?
The character that indicates decimal values can be either a period or a comma.
Chapter 3. Data files 11
Text Wizard: Step 3 (Delimited Files)
This step provides information about cases. A case is similar to a record in a database. For example, each
respondent to a questionnaire is a case.
The first case of data begins on which line number? Indicates the first line of the data file that contains
data values. If the top line(s) of the data file contain descriptive labels or other text that does not
represent data values, this will not be line 1.
How are your cases represented? Controls how the Text Wizard determines where each case ends and
the next one begins.
vEach line represents a case. Each line contains only one case. It is fairly common for each case to be
contained on a single line (row), even though this can be a very long line for data files with a large
number of variables. If not all lines contain the same number of data values, the number of variables
for each case is determined by the line with the greatest number of data values. Cases with fewer data
values are assigned missing values for the additional variables.
vA specific number of variables represents a case. The specified number of variables for each case
tells the Text Wizard where to stop reading one case and start reading the next. Multiple cases can be
contained on the same line, and cases can start in the middle of one line and be continued on the next
line. The Text Wizard determines the end of each case based on the number of values read, regardless
of the number of lines. Each case must contain data values (or missing values indicated by delimiters)
for all variables, or the data file will be read incorrectly.
How many cases do you want to import? You can import all cases in the data file, the first ncases (nis a
number you specify), or a random sample of a specified percentage. Since the random sampling routine
makes an independent pseudo-random decision for each case, the percentage of cases selected can only
approximate the specified percentage. The more cases there are in the data file, the closer the percentage
of cases selected is to the specified percentage.
Text Wizard: Step 3 (Fixed-Width Files)
This step provides information about cases. A case is similar to a record in a database. For example, each
respondent to questionnaire is a case.
The first case of data begins on which line number? Indicates the first line of the data file that contains
data values. If the top line(s) of the data file contain descriptive labels or other text that does not
represent data values, this will not be line 1.
How many lines represent a case? Controls how the Text Wizard determines where each case ends and
the next one begins. Each variable is defined by its line number within the case and its column location.
You need to specify the number of lines for each case to read the data correctly.
How many cases do you want to import? You can import all cases in the data file, the first ncases (nis a
number you specify), or a random sample of a specified percentage. Since the random sampling routine
makes an independent pseudo-random decision for each case, the percentage of cases selected can only
approximate the specified percentage. The more cases there are in the data file, the closer the percentage
of cases selected is to the specified percentage.
Text Wizard: Step 4 (Delimited Files)
This step specifies delimiters and text qualifiers that are used in the text data file. You can also specify
the treatment of leading and trailing spaces in string values.
Which delimiters appear between variables?
The characters or symbols that separate data values. You can select any combination of spaces,
commas, semicolons, tabs, or other characters. Multiple, consecutive delimiters without
intervening data values are treated as missing values.
12 IBM SPSS Statistics 24 Core System User's Guide
What is the text qualifier?
Characters that are used to enclose values that contain delimiter characters. The text qualifier
appears at both the beginning and the end of the value, enclosing the entire value.
Leading and Trailing Spaces
Controls the treatment of leading and trailing blank spaces in string values.
Remove leading spaces from string values
Any blank spaces at the beginning of string values are removed.
Remove trailing spaces from string values
Blank spaces at the end of a value are ignored when the defined width of string variables
is calculated. If Space is selected as a delimiter, multiple consecutive blank spaces are not
treated as multiple delimiters.
Text Wizard: Step 4 (Fixed-Width Files)
This step displays the Text Wizard's best guess on how to read the data file and allows you to modify
how the Text Wizard will read variables from the data file. Vertical lines in the preview window indicate
where the Text Wizard currently thinks each variable begins in the file.
Insert, move, and delete variable break lines as necessary to separate variables. If multiple lines are used
for each case, the data will be displayed as one line for each case, with subsequent lines appended to the
end of the line.
Notes:
For computer-generated data files that produce a continuous stream of data values with no intervening
spaces or other distinguishing characteristics, it may be difficult to determine where each variable begins.
Such data files usually rely on a data definition file or some other written description that specifies the
line and column location for each variable.
Text Wizard: Step 5
This step controls the variable name and the data format that is used to read each variable. You can also
specify variables to exclude.
Variable name
You can overwrite the default variable names with your own variable names. If you read variable
names from the data file, names that do not conform to variable naming rules are automatically
modified. Select a variable in the preview window and then enter a variable name.
Data format
Select a variable in the preview window and then select a format from the list.
vAutomatic determines the data format based on an evaluation of all the data values.
vTo exclude a variable, select Do Not Import.
Percentage of values that determine Automatic data format
For automatic format, the data format for each variable is determined by the percentage of values
that conform to the same format.
vThe value must be greater than 50.
vThe denominator used to determine the percentage is the number of non-blank values for each
variable.
vIf no consistent format is used by the specified percentage of values, the variable is assigned
the string data type.
vFor variables that are assigned a numeric format (including date and time formats) based on
the percentage value, values that do not conform to that format are assigned the
system-missing value.
Chapter 3. Data files 13
Text Wizard Formatting Options: Formatting options for reading variables with the Text Wizard
include:
Automatic. The format is determined based on an evaluation of all the data values.
Do not import. Omit the selected variable(s) from the imported data file.
Numeric. Valid values include numbers, a leading plus or minus sign, and a decimal indicator.
String. Valid values include virtually any keyboard characters and embedded blanks. For delimited files,
you can specify the number of characters in the value, up to a maximum of 32,767. By default, the Text
Wizard sets the number of characters to the longest string value encountered for the selected variable(s)
in the first 250 rows of the file. For fixed-width files, the number of characters in string values is defined
by the placement of variable break lines in step 4.
Date/Time. Valid values include dates of the general format dd-mm-yyyy, mm/dd/yyyy, dd.mm.yyyy,
yyyy/mm/dd, hh:mm:ss, and a variety of other date and time formats. Months can be represented in digits,
Roman numerals, or three-letter abbreviations, or they can be fully spelled out. Select a date format from
the list.
Dollar. Valid values are numbers with an optional leading dollar sign and optional commas as thousands
separators.
Comma. Valid values include numbers that use a period as a decimal indicator and commas as thousands
separators.
Dot. Valid values include numbers that use a comma as a decimal indicator and periods as thousands
separators.
Note: Values that contain invalid characters for the selected format will be treated as missing. Values that
contain any of the specified delimiters will be treated as multiple values.
Text Wizard: Step 6
This is the final step of the Text Wizard. You can save your specifications in a file for use when importing
similar text data files. You can also paste the syntax generated by the Text Wizard into a syntax window.
You can then customize and/or save the syntax for use in other sessions or in production jobs.
Cache data locally. A data cache is a complete copy of the data file, stored in temporary disk space.
Caching the data file can improve performance.
Reading Database Files
You can read data from any database format for which you have a database driver. In local analysis
mode, the necessary drivers must be installed on your local computer. In distributed analysis mode
(available with IBM SPSS Statistics Server), the drivers must be installed on the remote server. See the
topic Chapter 4, “Distributed Analysis Mode,” on page 45 for more information.
Note: If you are running the Windows 64-bit version of IBM SPSS Statistics, you cannot read Excel,
Access, or dBASE database sources, even though they may appear on the list of available database
sources. The 32-bit ODBC drivers for these products are not compatible.
To Read Database Files
1. From the menus choose:
File > Import Data > Database > New Query...
2. Select the data source.
14 IBM SPSS Statistics 24 Core System User's Guide
3. If necessary (depending on the data source), select the database file and/or enter a login name,
password, and other information.
4. Select the table(s) and fields. For OLE DB data sources (available only on Windows operating
systems), you can only select one table.
5. Specify any relationships between your tables.
6. Optionally:
vSpecify any selection criteria for your data.
vAdd a prompt for user input to create a parameter query.
vSave your constructed query before running it.
Connection Pooling
If you access the same database source multiple times in the same session or job, you can improve
performance with connection pooling.
1. In the last step of the wizard, paste the command syntax into a syntax window.
2. At the end of the quoted CONNECT string, add Pooling=true.
To Edit Saved Database Queries
1. From the menus choose:
File > Import Data > Database > Edit Query...
2. Select the query file (*.spq) that you want to edit.
3. Follow the instructions for creating a new query.
To Read Database Files with Saved Queries
1. From the menus choose:
File > Import Data > Database > Run Query...
2. Select the query file (*.spq) that you want to run.
3. If necessary (depending on the database file), enter a login name and password.
4. If the query has an embedded prompt, enter other information if necessary (for example, the quarter
for which you want to retrieve sales figures).
Selecting a Data Source
Use the first screen of the Database Wizard to select the type of data source to read.
ODBC Data Sources
If you do not have any ODBC data sources configured, or if you want to add a new data source, click
Add ODBC Data Source.
vOn Linux operating systems, this button is not available. ODBC data sources are specified in odbc.ini,
and the ODBCINI environment variables must be set to the location of that file. For more information,
see the documentation for your database drivers.
vIn distributed analysis mode (available with IBM SPSS Statistics Server), this button is not available.
To add data sources in distributed analysis mode, see your system administrator.
An ODBC data source consists of two essential pieces of information: the driver that will be used to
access the data and the location of the database you want to access. To specify data sources, you must
have the appropriate drivers installed. Drivers for a variety of database formats are included with the
installation media.
To access OLE DB data sources (available only on Microsoft Windows operating systems), you must have
the following items installed:
Chapter 3. Data files 15
v.NET framework. To obtain the most recent version of the .NET framework, go to http://
www.microsoft.com/net.
vData Collection Survey Reporter Developer Kit.
The following limitations apply to OLE DB data sources:
vTable joins are not available for OLE DB data sources. You can read only one table at a time.
vYou can add OLE DB data sources only in local analysis mode. To add OLE DB data sources in
distributed analysis mode on a Windows server, consult your system administrator.
vIn distributed analysis mode (available with IBM SPSS Statistics Server), OLE DB data sources are
available only on Windows servers, and both .NET and Data Collection Survey Reporter Developer Kit
must be installed on the server.
To add an OLE DB data source:
1. Click Add OLE DB Data Source.
2. In Data Link Properties, click the Provider tab and select the OLE DB provider.
3. Click Next or click the Connection tab.
4. Select the database by entering the directory location and database name or by clicking the button to
browse to a database. (A user name and password may also be required.)
5. Click OK after entering all necessary information. (You can make sure the specified database is
available by clicking the Test Connection button.)
6. Enter a name for the database connection information. (This name will be displayed in the list of
available OLE DB data sources.)
7. Click OK.
This takes you back to the first screen of the Database Wizard, where you can select the saved name from
the list of OLE DB data sources and continue to the next step of the wizard.
Deleting OLE DB Data Sources
To delete data source names from the list of OLE DB data sources, delete the UDL file with the name of
the data source in:
[drive]:\Documents and Settings\[user login]\Local Settings\Application Data\SPSS\UDL
Selecting Data Fields
The Select Data step controls which tables and fields are read. Database fields (columns) are read as
variables.
If a table has any field(s) selected, all of its fields will be visible in the following Database Wizard
windows, but only fields that are selected in this step will be imported as variables. This enables you to
create table joins and to specify criteria by using fields that you are not importing.
Displaying field names. To list the fields in a table, click the plus sign (+) to the left of a table name. To
hide the fields, click the minus sign (–) to the left of a table name.
To add a field. Double-click any field in the Available Tables list, or drag it to the Retrieve Fields In This
Order list. Fields can be reordered by dragging and dropping them within the fields list.
To remove a field. Double-click any field in the Retrieve Fields In This Order list, or drag it to the
Available Tables list.
Sort field names. If this check box is selected, the Database Wizard will display your available fields in
alphabetical order.
16 IBM SPSS Statistics 24 Core System User's Guide
By default, the list of available tables displays only standard database tables. You can control the type of
items that are displayed in the list:
vTables. Standard database tables.
vViews. Views are virtual or dynamic "tables" defined by queries. These can include joins of multiple
tables and/or fields derived from calculations based on the values of other fields.
vSynonyms. A synonym is an alias for a table or view, typically defined in a query.
vSystem tables. System tables define database properties. In some cases, standard database tables may
be classified as system tables and will only be displayed if you select this option. Access to real system
tables is often restricted to database administrators.
Note: For OLE DB data sources (available only on Windows operating systems), you can select fields only
from a single table. Multiple table joins are not supported for OLE DB data sources.
Creating a Relationship between Tables
The Specify Relationships step allows you to define the relationships between the tables for ODBC data
sources. If fields from more than one table are selected, you must define at least one join.
Establishing relationships. To create a relationship, drag a field from any table onto the field to which
you want to join it. The Database Wizard will draw a join line between the two fields, indicating their
relationship. These fields must be of the same data type.
Auto Join Tables. Attempts to automatically join tables based on primary/foreign keys or matching field
names and data type.
Join Type. If outer joins are supported by your driver, you can specify inner joins, left outer joins, or
right outer joins.
vInner joins. An inner join includes only rows where the related fields are equal. In this example, all
rows with matching ID values in the two tables will be included.
vOuter joins. In addition to one-to-one matching with inner joins, you can also use outer joins to merge
tables with a one-to-many matching scheme. For example, you could match a table in which there are
only a few records representing data values and associated descriptive labels with values in a table
containing hundreds or thousands of records representing survey respondents. A left outer join
includes all records from the table on the left and, from the table on the right, includes only those
records in which the related fields are equal. In a right outer join, the join imports all records from the
table on the right and, from the table on the left, imports only those records in which the related fields
are equal.
Computing New Fields
If you are in distributed mode, connected to a remote server (available with IBM SPSS Statistics Server),
you can compute new fields before you read the data into IBM SPSS Statistics.
You can also compute new fields after you read the data into IBM SPSS Statistics, but computing new
fields in the database can save time for large data sources.
New Field Name. The name must comply with IBM SPSS Statistics variable name rules.
Expression. Enter the expression to compute the new field. You can drag existing field names from the
Fields list and functions from the Functions list.
Limiting Retrieved Cases
The Limit Retrieved Cases step allows you to specify the criteria to select subsets of cases (rows).
Limiting cases generally consists of filling the criteria grid with criteria. Criteria consist of two
expressions and some relation between them. The expressions return a value of true, false, or missing for
each case.
Chapter 3. Data files 17
vIf the result is true, the case is selected.
vIf the result is false or missing, the case is not selected.
vMost criteria use one or more of the six relational operators (<, >, <=, >=, =, and <>).
vExpressions can include field names, constants, arithmetic operators, numeric and other functions, and
logical variables. You can use fields that you do not plan to import as variables.
To build your criteria, you need at least two expressions and a relation to connect the expressions.
1. To build an expression, choose one of the following methods:
vIn an Expression cell, type field names, constants, arithmetic operators, numeric and other
functions, or logical variables.
vDouble-click the field in the Fields list.
vDrag the field from the Fields list to an Expression cell.
vChoose a field from the drop-down menu in any active Expression cell.
2. To choose the relational operator (such as = or >), put your cursor in the Relation cell and either type
the operator or choose it from the drop-down menu.
If the SQL contains WHERE clauses with expressions for case selection, dates and times in expressions
need to be specified in a special manner (including the curly braces shown in the examples):
vDate literals should be specified using the general form {d yyyy-mm-dd}.
vTime literals should be specified using the general form {t hh:mm:ss}.
vDate/time literals (timestamps) should be specified using the general form {ts yyyy-mm-dd
hh:mm:ss}.
vThe entire date and/or time value must be enclosed in single quotes. Years must be expressed in
four-digit form, and dates and times must contain two digits for each portion of the value. For
example January 1, 2005, 1:05 AM would be expressed as:
{ts 2005-01-01 01:05:00}
Functions. A selection of built-in arithmetic, logical, string, date, and time SQL functions is provided.
You can drag a function from the list into the expression, or you can enter any valid SQL function.
See your database documentation for valid SQL functions.
Use Random Sampling. This option selects a random sample of cases from the data source. For large
data sources, you may want to limit the number of cases to a small, representative sample, which can
significantly reduce the time that it takes to run procedures. Native random sampling, if available for
the data source, is faster than IBM SPSS Statistics random sampling, because IBM SPSS Statistics
random sampling must still read the entire data source to extract a random sample.
vApproximately. Generates a random sample of approximately the specified percentage of cases.
Since this routine makes an independent pseudorandom decision for each case, the percentage of
cases selected can only approximate the specified percentage. The more cases there are in the data
file, the closer the percentage of cases selected is to the specified percentage.
vExactly. Selects a random sample of the specified number of cases from the specified total number
of cases. If the total number of cases specified exceeds the total number of cases in the data file, the
sample will contain proportionally fewer cases than the requested number.
Note: If you use random sampling, aggregation (available in distributed mode with IBM SPSS
Statistics Server) is not available.
Prompt For Value. You can embed a prompt in your query to create a parameter query. When users
run the query, they will be asked to enter information (based on what is specified here). You might
want to do this if you need to see different views of the same data. For example, you may want to
run the same query to see sales figures for different fiscal quarters.
3. Place your cursor in any Expression cell, and click Prompt For Value to create a prompt.
18 IBM SPSS Statistics 24 Core System User's Guide
Creating a Parameter Query
Use the Prompt for Value step to create a dialog box that solicits information from users each time
someone runs your query. This feature is useful if you want to query the same data source by using
different criteria.
To build a prompt, enter a prompt string and a default value. The prompt string is displayed each time a
user runs your query. The string should specify the kind of information to enter. If the user is not
selecting from a list, the string should give hints about how the input should be formatted. An example is
as follows: Enter a Quarter (Q1, Q2, Q3, ...).
Allow user to select value from list. If this check box is selected, you can limit the user to the values that
you place here. Ensure that your values are separated by returns.
Data type. Choose the data type here (Number, String, or Date).
Date and time values must be entered in special manner:
vDate values must use the general form yyyy-mm-dd.
vTime values must use the general form: hh:mm:ss.
vDate/time values (timestamps) must use the general form yyyy-mm-dd hh:mm:ss.
Aggregating Data
If you are in distributed mode, connected to a remote server (available with IBM SPSS Statistics Server),
you can aggregate the data before reading it into IBM SPSS Statistics.
You can also aggregate data after reading it into IBM SPSS Statistics, but preaggregating may save time
for large data sources.
1. To create aggregated data, select one or more break variables that define how cases are grouped.
2. Select one or more aggregated variables.
3. Select an aggregate function for each aggregate variable.
4. Optionally, create a variable that contains the number of cases in each break group.
Note: If you use IBM SPSS Statistics random sampling, aggregation is not available.
Defining Variables
Variable names and labels. The complete database field (column) name is used as the variable label.
Unless you modify the variable name, the Database Wizard assigns variable names to each column from
the database in one of two ways:
vIf the name of the database field forms a valid, unique variable name, the name is used as the variable
name.
vIf the name of the database field does not form a valid, unique variable name, a new, unique name is
automatically generated.
Click any cell to edit the variable name.
Converting strings to numeric values. Select the Recode to Numeric box for a string variable if you
want to automatically convert it to a numeric variable. String values are converted to consecutive integer
values based on alphabetical order of the original values. The original values are retained as value labels
for the new variables.
Width for variable-width string fields. This option controls the width of variable-width string values. By
default, the width is 255 bytes, and only the first 255 bytes (typically 255 characters in single-byte
languages) will be read. The width can be up to 32,767 bytes. Although you probably don't want to
truncate string values, you also don't want to specify an unnecessarily large value, which will cause
processing to be inefficient.
Chapter 3. Data files 19
Minimize string widths based on observed values. Automatically set the width of each string variable
to the longest observed value.
Sorting Cases
If you are in distributed mode, connected to a remote server (available with IBM SPSS Statistics Server),
you can sort the data before reading it into IBM SPSS Statistics.
You can also sort data after reading it into IBM SPSS Statistics, but presorting may save time for large
data sources.
Results
The Results step displays the SQL Select statement for your query.
vYou can edit the SQL Select statement before you run the query, but if you click the Back button to
make changes in previous steps, the changes to the Select statement will be lost.
vTo save the query for future use, use the Save query to file section.
vTo paste complete GET DATA syntax into a syntax window, select Paste it into the syntax editor for
further modification. Copying and pasting the Select statement from the Results window will not
paste the necessary command syntax.
Note: The pasted syntax contains a blank space before the closing quote on each line of SQL that is
generated by the wizard. These blanks are not superfluous. When the command is processed, all lines of
the SQL statement are merged together in a very literal fashion. Without the space, there would be no
space between the last character on one line and first character on the next line.
Reading Cognos BI data
If you have access to a IBM Cognos Business Intelligence server, you can read IBM Cognos Business
Intelligence data packages and list reports into IBM SPSS Statistics.
To read IBM Cognos Business Intelligence data:
1. From the menus choose:
File > Import Data > Cognos Business Intelligence
2. Specify the URL for the IBM Cognos Business Intelligence server connection.
3. Specify the location of the data package or report.
4. Select the data fields or report that you want to read.
Optionally, you can:
vSelect filters for data packages.
vImport aggregated data instead of raw data.
vSpecify parameter values.
Mode. Specifies the type of information you want to read: Data or Report. The only type of report that
can be read is a list report.
Connection. The URL of the Cognos Business Intelligence server. Click the Edit button to define the
details of a new Cognos connection from which to import data or reports. See the topic “Cognos
connections” on page 21 for more information.
Location. The location of the package or report that you want to read. Click the Edit button to display a
list of available sources from which to import content. See the topic “Cognos location” on page 21 for
more information.
Content. For data, displays the available data packages and filters. For reports, display the available
reports.
20 IBM SPSS Statistics 24 Core System User's Guide
Fields to import. For data packages, select the fields you want to include and move them to this list.
Report to import. For reports, select the list report you want to import. The report must be a list report.
Filters to apply. For data packages, select the filters you want to apply and move them to this list.
Parameters. If this button is enabled, the selected object has parameters defined. You can use parameters
to make adjustments (for example, perform a parameterized calculation) before importing the data. If
parameters are defined but no default is provided, the button displays a warning triangle.
Aggregate data before performing import. For data packages, if aggregation is defined in the package,
you can import the aggregated data instead of the raw data.
Cognos connections
The Cognos Connections dialog specifies the IBM Cognos Business Intelligence server URL and any
required additional credentials.
Cognos server URL. The URL of the IBM Cognos Business Intelligence server. This is the value of the
"external dispatcher URI" environment property of IBM Cognos Configuration on the server. Contact your
system administrator for more information
Mode. Select Set Credentials if you need to log in with a specific namespace, username and password
(for example, as an administrator). Select Use Anonymous connection to log in with no user credentials,
in which case you do not fill in the other fields. Select Stored Credentials to use the login information
from a stored credential. To use a stored credential, you must be connected to the IBM SPSS
Collaboration and Deployment Services Repository that contains the credential. After you are connected
to the repository, click Browse to see the list of available credentials.
Namespace ID. The security authentication provider used to log on to the server. The authentication
provider is used to define and maintain users, groups, and roles, and to control the authentication
process.
User name. Enter the user name with which to log on to the server.
Password. Enter the password associated with the specified user name.
Save as Default. Saves these settings as your default, to avoid having to re-enter them each time.
Cognos location
The Specify Location dialog box enables you to select a package from which to import data, or a package
or folder from which to import reports. It displays the public folders that are available to you. If you
select Data in the main dialog, the list will display folders containing data packages. If you select Report
in the main dialog, the list will display folders containing list reports. Select the location you want by
navigating through the folder structure.
Specifying parameters for data or reports
If parameters have been defined, either for a data object or a report, you can specify values for these
parameters before importing the data or report. An example of parameters for a report would be start
and end dates for the report contents.
Name. The parameter name as it is specified in the IBM Cognos Business Intelligence database.
Type. A description of the parameter.
Value. The value to assign to the parameter. To enter or edit a value, double-click its cell in the table.
Values are not validated here; any invalid values are detected at run time.
Chapter 3. Data files 21
Automatically remove invalid parameters from table. This option is selected by default and will remove
any invalid parameters found within the data object or report.
Changing variable names
For IBM Cognos Business Intelligence data packages, package field names are automatically converted to
valid variable names. You can use the Fields tab of the Read Cognos Data dialog to override the default
names. Names must be unique and must conform to variable naming rules. See the topic “Variable
names” on page 52 for more information.
Reading Cognos TM1 data
If you have access to an IBM Cognos TM1®database, you can import TM1 data from a specified view
into IBM SPSS Statistics. The multidimensional OLAP cube data from TM1 is flattened when read into
SPSS Statistics.
Important: To enable the exchange of data between SPSS Statistics and TM1, you must copy the
following three processes from SPSS Statistics to the TM1 server: ExportToSPSS.pro, ImportFromSPSS.pro,
and SPSSCreateNewMeasures.pro. To add these processes to the TM1 server, you must copy them to the
data directory of the TM1 server and restart the TM1 server. These files are available from the
common/scripts/TM1 directory under the SPSS Statistics installation directory.
Restriction:
vThe TM1 view from which you import must include one or more elements from a measure dimension.
vThe data to be imported from TM1 must be in UTF-8 format.
All of the data in the specified TM1 view are imported. It is therefore best to limit the view to the data
that are required for the analysis. Any necessary filtering of the data is best done in TM1, for example
with the TM1 Subset Editor.
To read TM1 data:
1. From the menus, choose:
File > Import Data > Cognos TM1
2. Connect to the TM1 Performance Management system.
3. Log in to the TM1 server.
4. Select a TM1 cube and select the view that you want to import.
Optionally, you can override the default names of the SPSS Statistics variables that are created from the
names of the TM1 dimensions and measures.
PM System
The URL of the Performance Management system that contains the TM1 server to which you
want to connect. The Performance Management system is defined as a single URL for all TM1
servers. From this URL, all TM1 servers that are installed and running on your environment can
be discovered and accessed. Enter the URL and click Connect.
TM1 Server
When the connection to the Performance Management system is established, select the server that
contains the data you want to import and click Login. If you did not previously connect to this
server, you are prompted to log in.
Username and password
Select this option to log in with a specific username and password. If the server uses
authentication mode 5 (IBM Cognos security), then select the namespace that identifies
the security authentication provider from the available list.
Stored credential
Select this option to use the login information from a stored credential. To use a stored
22 IBM SPSS Statistics 24 Core System User's Guide
credential, you must be connected to the IBM SPSS Collaboration and Deployment
Services Repository that contains the credential. After you are connected to the repository,
click Browse to see the list of available credentials.
Select a TM1 cube view to import
Lists the names of the cubes within the TM1 server from which you can import data.
Double-click a cube to display a list of the views that you can import. Select a view and click the
right arrow to move it into the View to import field.
Column dimension(s)
Lists the names of the column dimensions in the selected view.
Row dimension(s)
Lists the names of the row dimensions in the selected view.
Context dimension(s)
Lists the names of the context dimensions in the selected view.
Note:
vWhen the data are imported, a separate SPSS Statistics variable is created for each regular dimension
and for each element in the measure dimension.
vEmpty cells and cells with a value of zero in TM1 are converted to the system-missing value.
vCells with string values that cannot be converted to a numeric value are converted to the
system-missing value.
Changing variable names
By default, valid IBM SPSS Statistics variable names are automatically generated from the dimension
names and names of elements in the measure dimension from the selected IBM Cognos TM1 cube view.
You can use the Fields tab of the Import from TM1 dialog to override the default names. Names must be
unique and must conform to variable naming rules.
Reading Data Collection Data
On Microsoft Windows operating systems, you can read data from Data Collection products. (Note: This
feature is only available with IBM SPSS Statistics installed on Microsoft Windows operating systems.)
To read Data Collection data sources, you must have the following items installed:
v.NET framework. To obtain the most recent version of the .NET framework, go to http://
www.microsoft.com/net.
vData Collection Survey Reporter Developer Kit.
You can read Data Collection data sources only in local analysis mode. This feature is not available in
distributed analysis mode using IBM SPSS Statistics Server.
To read data from a Data Collection data source:
1. In any open IBM SPSS Statistics window, from the menus choose:
File > Import Data > Data Collection
2. On the Connection tab of Data Link Properties, specify the metadata file, the case data type, and the
case data file.
3. Click OK.
4. In the Data Collection Data Import dialog box, select the variables that you want to include and select
any case selection criteria.
5. Click OK to read the data.
Chapter 3. Data files 23
Data Link Properties Connection tab
To read a Data Collection data source, you need to specify:
Metadata Location. The metadata document file (.mdd) that contains questionnaire definition information.
Case Data Type. The format of the case data file. Available formats include:
vQuancept Data File (DRS). Case data in a Quancept .drs, .drz, or .dru file.
vQuanvert Database. Case data in a Quanvert database.
vData Collection Database (MS SQL Server). Case data in a relational database in SQL Server.
vData Collection XML Data File. Case data in an XML file.
Case Data Location. The file that contains the case data. The format of this file must be consistent with
the selected case data type.
Note: The extent to which other settings on the Connection tab or any settings on the other Data Link
Properties tabs may or may not affect reading Data Collection data into IBM SPSS Statistics is not known,
so we recommend that you do not change any of them.
Select Variables tab
You can select a subset of variables to read. By default, all standard variables in the data source are
displayed and selected.
vShow System variables. Displays any "system" variables, including variables that indicate interview
status (in progress, completed, finish date, and so on). You can then select any system variables that you
want to include. By default, all system variables are excluded.
vShow Codes variables. Displays any variables that represent codes that are used for open-ended
"Other" responses for categorical variables. You can then select any Codes variables that you want to
include. By default, all Codes variables are excluded.
vShow SourceFile variables. Displays any variables that contain filenames of images of scanned
responses. You can then select any SourceFile variables that you want to include. By default, all
SourceFile variables are excluded.
Case Selection Tab
For Data Collection data sources that contain system variables, you can select cases based on a number of
system variable criteria. You do not need to include the corresponding system variables in the list of
variables to read, but the necessary system variables must exist in the source data to apply the selection
criteria. If the necessary system variables do not exist in the source data, the corresponding selection
criteria are ignored.
Data collection status. You can select respondent data, test data, or both. You can also select cases based
on any combination of the following interview status parameters:
vCompleted successfully
vActive/in progress
vTimed out
vStopped by script
vStopped by respondent
vInterview system shutdown
vSignal (terminated by a signal statement in the script)
Data collection finish date. You can select cases based on the data collection finish date.
vStart Date. Cases for which data collection finished on or after the specified date are included.
vEnd Date. Cases for which data collection finished before the specified date are included. This does not
include cases for which data collection finished on the end date.
24 IBM SPSS Statistics 24 Core System User's Guide
vIf you specify both a start date and end date, this defines a range of finish dates from the start date to
(but not including) the end date.
File information
A data file contains much more than raw data. It also contains any variable definition information,
including:
vVariable names
vVariable formats
vDescriptive variable and value labels
This information is stored in the dictionary portion of the data file. The Data Editor provides one way to
view the variable definition information. You can also display complete dictionary information for the
active dataset or any other data file.
To Display Data File Information
1. From the menus in the Data Editor window choose:
File > Display Data File Information
2. For the currently open data file, choose Working File.
3. For other data files, choose External File, and then select the data file.
The data file information is displayed in the Viewer.
Saving data files
In addition to saving data files in IBM SPSS Statistics format, you can save data in a wide variety of
external formats, including:
vExcel and other spreadsheet formats
vTab-delimited and CSV text files
vSAS
vStata
vDatabase tables
To save modified data files
1. Make the Data Editor the active window (click anywhere in the window to make it active).
2. From the menus choose:
File > Save
The modified data file is saved, overwriting the previous version of the file.
To save data files in code page character encoding
Unicode data files cannot be read by IBM SPSS Statistics versions prior to version 16.0. In Unicode mode,
to save a data file in code page character encoding:
1. Make the Data Editor the active window (click anywhere in the window to make it active).
2. From the menus choose:
File > Save As
3. From the Save as type drop-down list in the Save Data dialog, select SPSS Statistics Local Encoding.
4. Enter a name for the new data file.
Chapter 3. Data files 25
The modified data file is saved in the current locale code page character encoding. This action has no
effect on the active dataset. The encoding of the active dataset is not changed. Saving a file in code page
character encoding is similar to saving a file in an external format, such as tab-delimited text or Excel.
Saving data files in external formats
1. Make the Data Editor the active window (click anywhere in the window to make it active).
2. From the menus choose:
File > Save As...
3. Select a file type from the drop-down list.
4. Enter a file name for the new data file.
Options
Depending on the file type, additional options are available.
Encoding
Available for SAS files and text data formats: tab-delimited, comma-delimited, and fixed ASCII
text.
Write variable names to file
Available for Excel, tab-delimited, comma-delimited, 1-2-3, and SYLK. For Excel 97 and later
versions, you can write either variable names or labels. For variables without defined variable
labels, the variable name is used.
Sheet name
For Excel 2007 and later versions, you can specify a sheet name. You can also append a sheet to
an existing file.
Save value labels to a .sas file
SAS 6 and later versions.
For information on exporting data to database tables, see “Exporting to a Database” on page 32.
Saving data: Data file types
You can save data in the following formats:
SPSS Statistics (*.sav). IBM SPSS Statistics format.
vData files saved in IBM SPSS Statistics format cannot be read by versions of the software prior to
version 7.5. Data files saved in Unicode encoding cannot be read by releases of IBM SPSS Statistics
prior to version 16.0
vWhen using data files with variable names longer than eight bytes in version 10.x or 11.x, unique,
eight-byte versions of variable names are used—but the original variable names are preserved for use
in release 12.0 or later. In releases prior to 10.0, the original long variable names are lost if you save the
data file.
vWhen using data files with string variables longer than 255 bytes in versions prior to release 13.0, those
string variables are broken up into multiple 255-byte string variables.
SPSS Statistics Compressed (*.zsav). Compressed IBM SPSS Statistics format.
vZSAV files have the same features as SAV files, but they take up less disk space.
vZSAV files may take more or less time to open and save, depending on the file size and system
configuration. Extra time is needed to de-compress and compress ZSAV files. However, because ZSAV
files are smaller on disk, they reduce the time needed to read and write from disk. As the file size gets
larger, this time savings surpasses the extra time needed to de-compress and compress the files.
vOnly IBM SPSS Statistics version 21 or higher can open ZSAV files.
26 IBM SPSS Statistics 24 Core System User's Guide
vThe option to save the data file with your local code page encoding is not available for ZSAV files.
These files are always saved in UTF-8 encoding.
SPSS Statistics Local Encoding (*.sav). In Unicode mode, this option saves the data file in the current
locale code page character encoding. This option is not available in code page mode.
SPSS 7.0 (*.sav). Version 7.0 format. Data files saved in version 7.0 format can be read by version 7.0 and
earlier versions but do not include defined multiple response sets or Data Entry for Windows
information.
SPSS/PC+ (*.sys). SPSS/PC+ format. If the data file contains more than 500 variables, only the first 500
will be saved. For variables with more than one defined user-missing value, additional user-missing
values will be recoded into the first defined user-missing value. This format is available only on
Windows operating systems.
Portable (*.por). Portable format that can be read by other versions of IBM SPSS Statistics and versions
on other operating systems. Variable names are limited to eight bytes and are automatically converted to
unique eight-byte names if necessary. In most cases, saving data in portable format is no longer necessary,
since IBM SPSS Statistics data files should be platform/operating system independent. You cannot save
data files in portable file in Unicode mode. See the topic “General options” on page 195 for more
information.
Tab-delimited (*.dat). Text files with values separated by tabs. (Note: Tab characters embedded in string
values are preserved as tab characters in the tab-delimited file. No distinction is made between tab
characters embedded in values and tab characters that separate values.) You can save files in Unicode
encoding or local code page encoding.
Comma-delimited (*.csv). Text files with values separated by commas or semicolons. If the current IBM
SPSS Statistics decimal indicator is a period, values are separated by commas. If the current decimal
indicator is a comma, values are separated by semicolons. You can save files in Unicode encoding or local
code page encoding.
Fixed ASCII (*.dat). Text file in fixed format, using the default write formats for all variables. There are
no tabs or spaces between variable fields. You can save files in Unicode encoding or local code page
encoding.
Excel 2007 (*.xlsx). Microsoft Excel 2007 XLSX-format workbook. The maximum number of variables is
16,000; any additional variables beyond the first 16,000 are dropped. If the dataset contains more than one
million cases, multiple sheets are created in the workbook.
Excel 97 through 2003 (*.xls). Microsoft Excel 97 workbook. The maximum number of variables is 256;
any additional variables beyond the first 256 are dropped. If the dataset contains more than 65,356 cases,
multiple sheets are created in the workbook.
Excel 2.1 (*.xls). Microsoft Excel 2.1 spreadsheet file. The maximum number of variables is 256, and the
maximum number of rows is 16,384.
1-2-3 Release 3.0 (*.wk3). Lotus 1-2-3 spreadsheet file, release 3.0. The maximum number of variables that
you can save is 256.
1-2-3 Release 2.0 (*.wk1). Lotus 1-2-3 spreadsheet file, release 2.0. The maximum number of variables that
you can save is 256.
1-2-3 Release 1.0 (*.wks). Lotus 1-2-3 spreadsheet file, release 1A. The maximum number of variables that
you can save is 256.
Chapter 3. Data files 27
SYLK (*.slk). Symbolic link format for Microsoft Excel and Multiplan spreadsheet files. The maximum
number of variables that you can save is 256.
dBASE IV (*.dbf). dBASE IV format.
dBASE III (*.dbf). dBASE III format.
dBASE II (*.dbf). dBASE II format.
SAS v9+ Windows (*.sas7bdat). SAS versions 9 for Windows. You can save files in Unicode (UTF-8) or
local code page encoding.
SAS v9+ UNIX (*.sas7bdat). SAS versions 9 for UNIX. You can save files in Unicode (UTF-8) or local
code page encoding.
SAS v7-8 Windows short extension (*.sd7). SAS versions 7–8 for Windows short filename format.
SAS v7-8 Windows long extension (*.sas7bdat). SAS versions 7–8 for Windows long filename format.
SAS v7-8 for UNIX (*.sas7bdat). SAS v8 for UNIX.
SAS v6 for Windows (*.sd2). SAS v6 file format for Windows/OS2.
SAS v6 for UNIX (*.ssd01). SAS v6 file format for UNIX (Sun, HP, IBM).
SAS v6 for Alpha/OSF (*.ssd04). SAS v6 file format for Alpha/OSF (DEC UNIX).
SAS Transport (*.xpt). SAS transport file.
Stata Version 13 Intercooled (*.dta).
Stata Version 13 SE (*.dta).
Stata Version 12 Intercooled (*.dta).
Stata Version 12 SE (*.dta).
Stata Version 11 Intercooled (*.dta).
Stata Version 11 SE (*.dta).
Stata Version 10 Intercooled (*.dta).
Stata Version 10 SE (*.dta).
Stata Version 9 Intercooled (*.dta).
Stata Version 9 SE (*.dta).
Stata Version 8 Intercooled (*.dta).
Stata Version 8 SE (*.dta).
Stata Version 7 Intercooled (*.dta).
Stata Version 7 SE (*.dta).
28 IBM SPSS Statistics 24 Core System User's Guide
Stata Version 6 (*.dta).
Stata Versions 4–5 (*.dta).
Note: SAS data file names can be up to 32 characters in length. Blank spaces and non-alphanumeric
characters other than the underscore ("_") are not allowed and names have to start with a letter or an
underscore, numbers can follow.
Saving data files in Excel format
You can save your data in one of three Microsoft Excel file formats. Excel 2.1, Excel 97, and Excel 2007.
vExcel 2.1 and Excel 97 are limited to 256 columns; so only the first 256 variables are included.
vExcel 2007 is limited to 16,000 columns; so only the first 16,000 variables are included.
vExcel 2.1 is limited to 16,384 rows; so only the first 16,384 cases are included.
vExcel 97 and Excel 2007 also have limits on the number of rows per sheet, but workbooks can have
multiple sheets, and multiple sheets are created if the single-sheet maximum is exceeded.
Options
vFor all versions of Excel, you can include variable names as the first row of the Excel file.
vFor Excel 97 and later versions, you can write either variable names or labels. For variables without
defined variable labels, the variable name is used.
vFor Excel 2007 and later versions, you can specify a sheet name. You can also append a sheet to an
existing file.
Variable types
The following table shows the variable type matching between the original data in IBM SPSS Statistics
and the exported data in Excel.
Table 2. How Excel data formats map to IBM SPSS Statistics variable types and formats
IBM SPSS Statistics Variable Type Excel Data Format
Numeric 0.00; #,##0.00; ...
Comma 0.00; #,##0.00; ...
Dollar $#,##0_); ...
Date d-mmm-yyyy
Time hh:mm:ss
String General
Saving data files in SAS format
Special handling is given to various aspects of your data when saved as a SAS file. These cases include:
vCertain characters that are allowed in IBM SPSS Statistics variable names are not valid in SAS, such as
@, #, and $. These illegal characters are replaced with an underscore when the data are exported.
vIBM SPSS Statistics variable names that contain multibyte characters (for example, Japanese or Chinese
characters) are converted to variables names of the general form Vnnn, where nnn is an integer value.
vIBM SPSS Statistics variable labels containing more than 40 characters are truncated when exported to
a SAS v6 file.
vWhere they exist, IBM SPSS Statistics variable labels are mapped to the SAS variable labels. If no
variable label exists in the IBM SPSS Statistics data, the variable name is mapped to the SAS variable
label.
Chapter 3. Data files 29
vSAS allows only one value for system-missing, whereas IBM SPSS Statistics allows numerous
user-missing values in addition to system-missing. As a result, all user-missing values in IBM SPSS
Statistics are mapped to a single system-missing value in the SAS file.
vSAS 6-8 data files are saved in the current IBM SPSS Statistics locale encoding, regardless of current
mode (Unicode or code page). In Unicode mode, SAS 9 files are saved in UTF-8 format. In code page
mode, SAS 9 files are saved in the current locale encoding.
vA maximum of 32,767 variables can be saved to SAS 6-8.
vSAS data file names can be up to 32 characters in length. Blank spaces and non-alphanumeric
characters other than the underscore ("_") are not allowed and names have to start with a letter or an
underscore, numbers can follow.
Save value labels
You have the option of saving the values and value labels associated with your data file to a SAS syntax
file. This syntax file contains proc format and proc datasets commands that can be run in SAS to create
a SAS format catalog file.
This feature is not supported for the SAS transport file.
Variable types
The following table shows the variable type matching between the original data in IBM SPSS Statistics
and the exported data in SAS.
Table 3. How SAS variable types and formats map to IBM SPSS Statistics types and formats
IBM SPSS Statistics Variable Type SAS Variable Type SAS Data Format
Numeric Numeric 12
Comma Numeric 12
Dot Numeric 12
Scientific Notation Numeric 12
Date Numeric (Date) for example, MMDDYY10, ...
Date (Time) Numeric Time18
Dollar Numeric 12
Custom Currency Numeric 12
String Character $8
Saving data files in Stata format
vData can be written in Stata 5–13 format and in both Intercooled and SE format (version 7 or later).
vData files that are saved in Stata 5 format can be read by Stata 4.
vThe first 80 bytes of variable labels are saved as Stata variable labels.
vFor Stata releases 4-8, the first 80 bytes of value labels for numeric variables are saved as Stata value
labels. For Stata release 9 or later, the complete value labels for numeric variables are saved. Value
labels are dropped for string variables, non-integer numeric values, and numeric values greater than an
absolute value of 2,147,483,647.
vFor versions 7 and later, the first 32 bytes of variable names in case-sensitive form are saved as Stata
variable names. For earlier versions, the first eight bytes of variable names are saved as Stata variable
names. Any characters other than letters, numbers, and underscores are converted to underscores.
vIBM SPSS Statistics variable names that contain multi-byte characters (for example, Japanese or Chinese
characters) are converted to generic single-byte variable names.
30 IBM SPSS Statistics 24 Core System User's Guide
vFor versions 5–6 and Intercooled versions 7 and later, the first 80 bytes of string values are saved. For
Stata SE 7–12, the first 244 bytes of string values are saved. For Stata SE 13 or later, complete string
values are saved, regardless of length.
vFor versions 5–6 and Intercooled versions 7 and later, only the first 2,047 variables are saved. For Stata
SE 7 or later, only the first 32,767 variables are saved.
Table 4. How Stata variable type and format map to IBM SPSS Statistics type and format
IBM SPSS Statistics Variable Type Stata Variable Type Stata Data Format
Numeric Numeric g
Comma Numeric g
Dot Numeric g
Scientific Notation Numeric g
Date*, Datetime Numeric D_m_Y
Time, DTime Numeric g (number of seconds)
Wkday Numeric g (1–7)
Month Numeric g (1–12)
Dollar Numeric g
Custom Currency Numeric g
String String s
*Date, Adate, Edate, SDate, Jdate, Qyr, Moyr, Wkyr
Saving Subsets of Variables
The Save Data As Variables dialog box allows you to select the variables that you want saved in the new
data file. By default, all variables will be saved. Deselect the variables that you don't want to save, or
click Drop All and then select the variables that you want to save.
Visible Only. Selects only variables in variable sets currently in use. See the topic “Using variable sets to
show and hide variables” on page 192 for more information.
To Save a Subset of Variables
1. Make the Data Editor the active window (click anywhere in the window to make it active).
2. From the menus choose:
File > Save As...
3. Click Variables.
4. Select the variables that you want to save.
Encrypting data files
For IBM SPSS Statistics data files, you can protect confidential information stored in a data file by
encrypting the file with a password. Once encrypted, the file can only be opened by providing the
password.
1. Make the Data Editor the active window (click anywhere in the window to make it active).
2. From the menus choose:
File > Save As...
3. Select Encrypt file with password in the Save Data As dialog box.
4. Click Save.
Chapter 3. Data files 31
5. In the Encrypt File dialog box, provide a password and re-enter it in the Confirm password text box.
Passwords are limited to 10 characters and are case-sensitive.
Warning: Passwords cannot be recovered if they are lost. If the password is lost the file cannot be opened.
Creating strong passwords
vUse eight or more characters.
vInclude numbers, symbols and even punctuation in your password.
vAvoid sequences of numbers or characters, such as "123" and "abc", and avoid repetition, such as
"111aaa".
vDo not create passwords that use personal information such as birthdays or nicknames.
vPeriodically change the password.
Note: Storing encrypted files to an IBM SPSS Collaboration and Deployment Services Repository is not
supported.
Modifying encrypted files
vIf you open an encrypted file, make modifications to it and choose File > Save, the modified file will be
saved with the same password.
vYou can change the password on an encrypted file by opening the file, repeating the steps for
encrypting it, and specifying a different password in the Encrypt File dialog box.
vYou can save an unencrypted version of an encrypted file by opening the file, choosing File > Save As
and deselecting Encrypt file with password in the Save Data As dialog box.
Note: Encrypted data files and output documents cannot be opened in versions of IBM SPSS Statistics
prior to version 21. Encrypted syntax files cannot be opened in versions prior to version 22.
Exporting to a Database
You can use the Export to Database Wizard to:
vReplace values in existing database table fields (columns) or add new fields to a table.
vAppend new records (rows) to a database table.
vCompletely replace a database table or create a new table.
To export data to a database:
1. From the menus in the Data Editor window for the dataset that contains the data you want to export,
choose:
File > Export > Database
2. Select the database source.
3. Follow the instructions in the export wizard to export the data.
Creating Database Fields from IBM SPSS Statistics Variables
When creating new fields (adding fields to an existing database table, creating a new table, replacing a
table), you can specify field names, data type, and width (where applicable).
Field name. The default field names are the same as the IBM SPSS Statistics variable names. You can
change the field names to any names allowed by the database format. For example, many databases
allow characters in field names that aren't allowed in variable names, including spaces. Therefore, a
variable name like CallWaiting could be changed to the field name Call Waiting.
Type. The export wizard makes initial data type assignments based on the standard ODBC data types or
data types allowed by the selected database format that most closely matches the defined IBM SPSS
32 IBM SPSS Statistics 24 Core System User's Guide
Statistics data format--but databases can make type distinctions that have no direct equivalent in IBM
SPSS Statistics, and vice versa. For example, most numeric values in IBM SPSS Statistics are stored as
double-precision floating-point values, whereas database numeric data types include float (double),
integer, real, and so on. In addition, many databases don't have equivalents to IBM SPSS Statistics time
formats. You can change the data type to any type available in the drop-down list.
As a general rule, the basic data type (string or numeric) for the variable should match the basic data
type of the database field. If there is a data type mismatch that cannot be resolved by the database, an
error results and no data are exported to the database. For example, if you export a string variable to a
database field with a numeric data type, an error will result if any values of the string variable contain
non-numeric characters.
Width. You can change the defined width for string (char, varchar) field types. Numeric field widths are
defined by the data type.
By default, IBM SPSS Statistics variable formats are mapped to database field types based on the
following general scheme. Actual database field types may vary, depending on the database.
Table 5. Format conversion for databases
IBM SPSS Statistics Variable Format Database Field Type
Numeric Float or Double
Comma Float or Double
Dot Float or Double
Scientific Notation Float or Double
Date Date or Datetime or Timestamp
Datetime Datetime or Timestamp
Time, DTime Float or Double (number of seconds)
Wkday Integer (1–7)
Month Integer (1–12)
Dollar Float or Double
Custom Currency Float or Double
String Char or Varchar
User-Missing Values
There are two options for the treatment of user-missing values when data from variables are exported to
database fields:
vExport as valid values. User-missing values are treated as regular, valid, nonmissing values.
vExport numeric user-missing as nulls and export string user-missing values as blank spaces.
Numeric user-missing values are treated the same as system-missing values. String user-missing values
are converted to blank spaces (strings cannot be system-missing).
Selecting a Data Source
In the first panel of the Export to Database Wizard, you select the data source to which you want to
export data.
You can export data to any database source for which you have the appropriate ODBC driver. (Note:
Exporting data to OLE DB data sources is not supported.)
If you do not have any ODBC data sources configured, or if you want to add a new data source, click
Add ODBC Data Source.
Chapter 3. Data files 33
vOn Linux operating systems, this button is not available. ODBC data sources are specified in odbc.ini,
and the ODBCINI environment variables must be set to the location of that file. For more information,
see the documentation for your database drivers.
vIn distributed analysis mode (available with IBM SPSS Statistics Server), this button is not available.
To add data sources in distributed analysis mode, see your system administrator.
An ODBC data source consists of two essential pieces of information: the driver that will be used to
access the data and the location of the database you want to access. To specify data sources, you must
have the appropriate drivers installed. Drivers for a variety of database formats are included with the
installation media.
Some data sources may require a login ID and password before you can proceed to the next step.
Choosing How to Export the Data
After you select the data source, you indicate the manner in which you want to export the data.
The following choices are available for exporting data to a database:
vReplace values in existing fields. Replaces values of selected fields in an existing table with values
from the selected variables in the active dataset. See the topic “Replacing Values in Existing Fields” on
page 35 for more information.
vAdd new fields to an existing table. Creates new fields in an existing table that contain the values of
selected variables in the active dataset. See the topic “Adding New Fields” on page 35 for more
information. This option is not available for Excel files.
vAppend new records to an existing table. Adds new records (rows) to an existing table containing the
values from cases in the active dataset. See the topic “Appending New Records (Cases)” on page 36 for
more information.
vDrop an existing table and create a new table of the same name. Deletes the specified table and
creates a new table of the same name that contains selected variables from the active dataset. All
information from the original table, including definitions of field properties (for example, primary keys,
data types) is lost. See the topic “Creating a New Table or Replacing a Table” on page 36 for more
information.
vCreate a new table. Creates a new table in the database containing data from selected variables in the
active dataset. The name can be any value that is allowed as a table name by the data source. The
name cannot duplicate the name of an existing table or view in the database. See the topic “Creating a
New Table or Replacing a Table” on page 36 for more information.
Selecting a Table
When modifying or replacing a table in the database, you need to select the table to modify or replace.
This panel in the Export to Database Wizard displays a list of tables and views in the selected database.
By default, the list displays only standard database tables. You can control the type of items that are
displayed in the list:
vT