CSV To SAV Instructions Spss

User Manual: Pdf

Open the PDF directly: View PDF .
Page Count: 8

Download
Open PDF In Browser	View PDF

CSV to SAV Instructions
Dina Sinclair
January 16, 2018
If you want to convert a file from CSV (a format you can get from the server) to SAV (a format you can load
into SPSS), this document will help you get started.

Overarching Code Format
When you change a file format from csv to sav in R, your code will follow the following basic steps:
1. Import the data into R from your csv file
2. Clean the data. This might mean fixing the variable names or changing the type (numeric, string,
factor, etc) of desired columns.
3. Export the data into an sav file.
The following is an example piece of code. Lines starting with ‘#’ are comments, meaning that R ignores
them.
#
#
#
d

STEP ONE: read (import) the data into R. Here we use the function read.csv, and the
first entry '18 01 08 donnee.csv' is the name of the csv file we want to use.
More on the import step in a later section.
<- read.csv("18 01 08 donnee.csv", na.strings = c("---",""), check.names=FALSE)

# STEP TWO: clean the data before saving it as an sav file. Here that means shortening
# the variable names and making sure the number column of the data is saved as an integer.
# More on the cleaning step in a later section.
names(d) <- gsub(".*\\.", "",names(d))
names(d) <- make.names(names(d))
names(d) <- gsub("\\.\\.\\.",'_',names(d))
d$Number <- as.integer(d$Number)
# STEP THREE: export (save) the data as an sav file using the write_sav function from
# the haven library.
# More on the export step in a later section.
library(haven)
write_sav(d,"exported_donnee.sav")

Reading/Importing the Data
To successfully read the data from a csv, there are three important questions to ask.
1. How are pieces of data separated in this csv file? By commas (the default) or by some other means
(semicolons, spaces, etc)?
2. How are NAs represented in this csv file?
3. Do I want R to keep the original variable names, or can it fix the variable names so that they are
readible in R?

1

Data Entry Separation
To start to answer these questions, a good first step is to import the data into R using the read_csv command,
then look at the first six lines of data using the head command.
ex1 <- read.csv("Example.csv")
head(ex1)
##
##
##
##
##
##
##

1
2
3
4
5
6

ex.pays ex.interviewer ex.temps ex.registrer ex.nombre_denfants
Mali
--EB
0.82
1
Mali
adoptee
EF
0.56
2
GB
participant
EF
0.44
--Mali
simple
EB
0.55
5
SEN
adoptee
EF
0.29
--SEN
participant
EF
0.45
2

Above, the result of the head command gives you the correct number of columns and the columns are
filled with data as expected. Great - the file likely uses the comma to separate data values. If you open
“Example.csv” in a text editor, you will indeed see that all of the entries are separated by commas.
But what happens if the data is separated by a different character? Below, we see one such example.
ex2 <- read.csv("Example - Semicolons.csv")
head(ex2)
##
##
##
##
##
##
##

1
2
3
4
5
6

ex.pays.ex.interviewer.ex.temps.ex.registrer.ex.nombre_denfants
Mali;---;EB;0.82;1
Mali;adoptee;EF;0.56;2
GB;participant;EF;0.44;--Mali;simple;EB;0.55;5
SEN;adoptee;EF;0.29;--SEN;participant;EF;0.45;2

If you see the data only loads into one column or row (or maybe no data shows up at all), open the csv file in
a text editor. How are the data entries separated? If a character other than a comma is used, we’ll need to
tell R so that it knows how to read the data properly. If you open up “Example - Semicolons.csv” in a text
editor, you’ll see that it’s separated by semicolons. We can use the argument ‘sep’ in the read.csv function to
tell R that we need to use semicolons to separate data entries, then again use head to look at the data and
see if the problem has been fixed.
ex3 <- read.csv("Example - Semicolons.csv", sep = ';')
head(ex3)
##
##
##
##
##
##
##

1
2
3
4
5
6

ex.pays ex.interviewer ex.temps ex.registrer ex.nombre_denfants
Mali
--EB
0.82
1
Mali
adoptee
EF
0.56
2
GB
participant
EF
0.44
--Mali
simple
EB
0.55
5
SEN
adoptee
EF
0.29
--SEN
participant
EF
0.45
2

Sure enough, now the data loads in correctly.
Non Applicables
After loading in the data using the correct data entry separation, we also need to check that R has identified
the NA entries in the data. The default way to write NA in R is ‘NA’, and if your data uses a different set of

2

characters to represent NA entries, you need to tell R to look for that different set of characters. Find an NA
element in the head entry of your dataset. Do you see it represented as the text NA, or something else?
ex4 <- read.csv("Example.csv")
head(ex4)
##
##
##
##
##
##
##

1
2
3
4
5
6

ex.pays ex.interviewer ex.temps ex.registrer ex.nombre_denfants
Mali
--EB
0.82
1
Mali
adoptee
EF
0.56
2
GB
participant
EF
0.44
--Mali
simple
EB
0.55
5
SEN
adoptee
EF
0.29
--SEN
participant
EF
0.45
2

In this example, we see that the NA elements are represented as ‘—’. Since R doesn’t know that ‘—’ is NA, it
will see ‘—’ as a string, and therefore read any columns with ‘—’ as strings or factors, even if the rest of the
elements in the column are numeric. To fix this, we can use the na.strings argument in the read.csv function.
Dans cet exemple, nous voyons que les éléments NA sont représentés par ‘—’. Puisque R ne sait pas que ‘—’
est NA, il verra ‘—’ comme une chaîne, et donc lira n’importe quelle colonne avec ‘—’ comme chaînes ou
facteurs, même si le reste des éléments dans la colonne sont numériques. Pour résoudre ce problème, nous
pouvons utiliser l’argument na.strings dans la fonction read.csv.
ex5 <- read.csv("Example.csv", na.strings = '---')
head(ex5)
##
##
##
##
##
##
##

1
2
3
4
5
6

ex.pays ex.interviewer ex.temps ex.registrer ex.nombre_denfants
Mali

EB
0.82
1
Mali
adoptee
EF
0.56
2
GB
participant
EF
0.44
NA
Mali
simple
EB
0.55
5
SEN
adoptee
EF
0.29
NA
SEN
participant
EF
0.45
2

Now, the NA entries show up as NA, like we want.
Keeping or Fixing Variable Names
Sometimes, the variable names (column headers) in the csv file will use characters R can’t use as variable
names. Example characters that are invalid in R variable names are -,*,$,+ and spaces. This is because these
characters represent operations in R. For example, if you had variables a, b and a-b, how would R know if
a-b means the variable ‘a-b’ or the variable ‘a’ minus the variable ‘b’?
If you don’t tell R to keep the original variable names, it will fix all of the variable names by changing all the
characters R can’t use to periods. Sometimes, though, you’ll want to keep the original variable names (we’ll
talk about when in the cleaning data section). If you want to keep the original variable names, you can do so
using the check.names argument in the read.csv function.
ex6 <- read.csv("Example.csv", check.names = FALSE)
head(ex6)
##
##
##
##
##
##
##

1
2
3
4
5
6

ex-pays ex-interviewer ex-temps ex-registrer ex-nombre_denfants
Mali
--EB
0.82
1
Mali
adoptee
EF
0.56
2
GB
participant
EF
0.44
--Mali
simple
EB
0.55
5
SEN
adoptee
EF
0.29
--SEN
participant
EF
0.45
2
3

Here, we can see that variable names that used to show up as ‘ex.pays’ and ex.interviewer’ now are ‘ex-pays’
and ‘ex-interviewer’, as they were originally in the csv file.

Cleaning the Data
Before exporting to SPSS, we often want to make the data easier to use. This might mean changing the
variable names or the type (numerics, string, factor, etc) of a column of data.
Changing Variable Names
To change the variable names, we first need to know what the variable names are. We can figure that out
using the command names().
ex7 <- read.csv("Example.csv")
names(ex7)
## [1] "ex.pays"
## [4] "ex.registrer"

"ex.interviewer"
"ex.temps"
"ex.nombre_denfants"

Read through the names and decide: are you okay with the variable names, or would you like to reformat
them? Reformatting them might mean shortening them, changing them from upper to lower case, or any
other string manipulation you can come up with. If you’re happy with the variable names the way they are,
great - you can skip this section! If you’re not happy with the variable names, ask yourself: can I write down
clear instructions on how to change the names of these variables? Imagine you have to give these instructions
to a stranger, along with the list of variable names. If they can use the instructions to change all of the
variable correctly, then the instructions are good ones.
Examples of good (clear) instructions:
• Remove all text up to and including the last period, keep everything to the right of the last period.
• Change all letters to lowercase and change all ‘. . . ’ to ’_’
If in this case we might decide to remove all text up to and including the last period. To do so, we can use
the gsub() function.
ex8 <- read.csv("Example.csv")
names(ex8) <- gsub('.*\\.','',names(ex8))
names(ex8)
## [1] "pays"
"interviewer"
## [5] "nombre_denfants"

"temps"

"registrer"

Note that we don’t just call gsub, we assign the result of the gsub to names(ex8) to update the variable
names. Make sure to update the variable names by assigning a new value to names() every time you want to
make a change, otherwise your work won’t be saved!
If you can’t see any patterns in your variable names that will let you think of good change instructions, it
might be easier to look at the original variable names instead (remember, unless you tell R otherwise, it
replaces characters it can’t read in variable names to periods). To do so, use the check.names option we
mentioned in the importing section:
ex9 <- read.csv("Example.csv", check.names = FALSE)
names(ex9)
## [1] "ex-pays"
## [4] "ex-registrer"

"ex-interviewer"
"ex-temps"
"ex-nombre_denfants"

4

If you can see a good rule or set of instructions to use now, great. If not, R may not be able to help you,
since you can’t tell R what to do if you don’t have instructions to give it!
Here, we might decide to remove everything in the variable names up through the last dash ‘-’, or to simply
remove all versions of the phrase ‘ex-’. Either works just fine, and the code for both are below:
Si vous pouvez voir une bonne règle ou un ensemble d’instructions à utiliser maintenant, génial. Sinon, R
peut ne pas être en mesure de vous aider, car vous ne pouvez pas dire à R ce qu’il faut faire si vous n’avez
pas d’instructions à lui donner!
Ici, nous pouvons décider de supprimer tout le contenu des noms de variables dans le dernier tiret ‘-’, ou de
supprimer simplement toutes les versions de la phrase ‘ex-’. Soit fonctionne très bien, et le code pour les deux
sont ci-dessous:
# Removing everything up through the last dash
gsub('.*-','',names(ex9))
## [1] "pays"
"interviewer"
## [5] "nombre_denfants"

"temps"

"registrer"

# Removing all instances of the phrase 'ex-'
gsub('ex-','',names(ex9))
## [1] "pays"
"interviewer"
## [5] "nombre_denfants"

"temps"

"registrer"

Note that these are examples of us trying out code, but they haven’t assigned the output to names(), so none
of the work is saved. If we see that they both work, we can pick either one to use. Let’s say we pick the first
way. Then the code we would need to write to save the results of our gsub to the variable names is
names(ex9) <- gsub('.*-','',names(ex9))
If you’re using the original variable names, after manipulating them you need to make sure R can read them.
To do that, we use the make.names() function. If you forget this step, you might get errors in your R code or
weird looking variables (v1, v2, etc) in your SPSS file.
names(ex9) <- make.names(ex9)
You can read more about the gsub function here and more about string manipulations in general here. There
are a huge variety of commands you can use, but a summary of key commands likely to come up for Tostan
data is
ex10 <- read.csv("Example - Names.csv", check.names = FALSE)
# The original names
names(ex10)
## [1] "EXAMPLE-DATA.COUNTRY"
## [3] "EXAMPLE-DATA.PERIOD"

"EXAMPLE-DATA.INTERVIEWER"

# Removing all characters up through the last period (here, \\. represents a period)
gsub('.*\\.','',names(ex10))
## [1] "COUNTRY"
"INTERVIEWER" "PERIOD"
# Removing all characters up through the last dash
gsub('.*-','',names(ex10))
## [1] "DATA.COUNTRY"
"DATA.INTERVIEWER" "DATA.PERIOD"
# Removing a specific set of characters
gsub('EXAMPLE-DATA.','',names(ex10))

5

## [1] "COUNTRY"
"INTERVIEWER" "PERIOD"
# Changing a set of characters to another set of characters
gsub('EXAMPLE-DATA','data', names(ex10))
## [1] "data.COUNTRY"

"data.INTERVIEWER" "data.PERIOD"

# Changing the words to lowercase (to change to uppercase, use toupper())
tolower(names(ex10))
## [1] "example-data.country"
## [3] "example-data.period"
# Make the names readible by R
make.names(names(ex10))

"example-data.interviewer"

## [1] "EXAMPLE.DATA.COUNTRY"
## [3] "EXAMPLE.DATA.PERIOD"

"EXAMPLE.DATA.INTERVIEWER"

You can also do several of these commands in a row. Note that order here matters! And remember to assign
the new value to names() every time you use a command. For example, the following are the same string
manipulation commands used at the example script at the top of this file.
# Read in the data, using the original variable names
ex11 <- read.csv("18 01 08 donnee.csv", check.names=FALSE)
# Look at the first 15 variable names
head(names(ex11),15)
## [1] "Number"
## [2] "Pays"
## [3] "localization.departement"
## [4] "localization.commune"
## [5] "localization.village"
## [6] "type_de_communaut"
## [7] "type_evaluation"
## [8] "interviewer"
## [9] "ethnie.nbre_groupe_ethnique"
## [10] "demog_enquete.CE1_caracteristiques_enquete.CE2_sexe"
## [11] "demog_enquete.CE1_caracteristiques_enquete.CE3_groupe_ethnique"
## [12] "demog_enquete.CE1_caracteristiques_enquete.CE4_age_enquete.CE4x1_connait_age"
## [13] "demog_enquete.CE1_caracteristiques_enquete.CE4_age_enquete.CE4x1a_age_exact"
## [14] "demog_enquete.CE1_caracteristiques_enquete.CE4_age_enquete.CE4x1b_tranche_age"
## [15] "demog_enquete.CE1_caracteristiques_enquete.CE5_etat_matrimonial"
# Remove the text up through the last period
names(ex11) <- gsub(".*\\.", "",names(ex11))
# Change the variable names so that R can read them (this will create some '...')
names(ex11) <- make.names(names(ex11))
# Change instances of '...' to '_' to increase readability
names(ex11) <- gsub("\\.\\.\\.",'_',names(ex11))
# Look at the first 15 resulting variable names
head(names(ex11),15)
## [1] "Number"
## [4] "commune"
## [7] "type_evaluation"
## [10] "CE2_sexe"
## [13] "CE4x1a_age_exact"

"Pays"
"village"
"interviewer"
"CE3_groupe_ethnique"
"CE4x1b_tranche_age"

6

"departement"
"type_de_communaut"
"nbre_groupe_ethnique"
"CE4x1_connait_age"
"CE5_etat_matrimonial"

Changing Column Types
Sometimes, R might not guess the type (integer, numeric, string, factor, etc) of each column correctly. You
can fix that before you export to SPSS. If you notice in you SPSS file that a column isn’t the right type, you
can change it using the commands
ex12 <- read.csv("18 01 08 donnee.csv", na.strings = '---', check.names=FALSE)
# Convert to integer
ex12$Number <- as.integer(ex12$Number)
# Convert to factor
ex12$localization.commune <- as.factor(ex12$localization.commune)
# Convert to decimal
ex12$ethnie.nbre_groupe_ethnique <- as.numeric(ex12$ethnie.nbre_groupe_ethnique)
# Convert to character
ex12$interviewer <- as.character(ex12$interviewer)

Writing/Exporting the Data
Once your data is in a format you’re happy with, you can export it to an sav file, which you can open in
SPSS. To do so, we need to use a library (also known as a package) called haven. If you’ve never installed
haven before, go ahead and click on Tools>Install Packages, type the word ‘haven’ into the packages bar,
click install, and wait until the installation is complete. If you’ve already installed haven, or think you might
have, go ahead and move on to the next step - if it turns out haven isn’t installed, you’ll get an error that
reads
Error in library(haven) : there is no package called ‘haven’
in which case you should go install haven using the instructions above.
Once haven is installed, we need to load it in a file if we want to use it. To load haven, we use the command
library(haven). After that command, we can use the write_sav() file from the haven library to made an
sav file, with the name of the data and new filename as arguments. The file will then appear in the same
directory/folder as wherever the file executing the write_sav() command is saved.
library(haven)
ex13 <- read.csv("18 01 08 donnee.csv", na.strings = '---', check.names=FALSE)
write_sav(ex13,"new_sav_file.sav")
If the write_sav() command executes without errors, you should be able to open your new sav file in SPSS
now! You may get an error or two the first time you run your file, and that’s okay, since there are ways to fix
those errors.
Errors in the R Code
Here are some of the most likely errors you’ll encounter:
Error: SPSS only supports levels with <= 120 characters Problems: Number
This means that R thinks that the Number column should be a factor, but there are over 120 different items
in that column which is too many options for a factor in SPSS. Number should actually be an integer, so to
fix this, you can use the as.integer function on the problematic column
library(haven)
ex13 <- read.csv("18 01 08 donnee.csv", na.strings = '---', check.names=FALSE)
ex13$Number <- as.integer(ex13$Number)
write_sav(ex13,"new_sav_file.sav")

7

like we saw in the data cleaning section. If the column with the error should actually be a string, then change
it to characters using the command
ex13$Number <- as.character(ex13$Number)
After changing the format to character (or sometimes with columns that are already characters rather than
factors) you might see the error
Error in write_sav_(data, normalizePath(path, mustWork = FALSE)) : Writing failure: A provided string
value was longer than the available storage size of the specified column.
This means that one of the data entries is too long and takes up too much space (likely, it’s a long text
response to an open-ended question). One potential way of solving this is to reencode the problematic column
in a way that takes up less space, which you can do using the function enc2utf8():
ex13$Number <- enc2utf8(ex13$Number)

Errors in the SAV File After Opening SPSS
You may also see issues in the SAV file when you open it in SPSS. The most common such issue is that some
or all of the variable names will be replaced by v1, v2, v3, etc. If all of the variable names are lost, you most
likely forgot to make sure the names of the variables are readible in R. Add the line
names(data) <- make.names(data)
before the write_sav() command in your code, replacing the word data above with the name of your data
variable.
If instead only some of the variables in your SPSS file are missing, you likely have some duplicate variable
names in your naming scheme. Try adding the line
names(data)[duplicated(names(data))]
to your code, right before the write_sav() commmand, again replacing the word data with the name of
your data variable. This line will print any duplicate names in your code, so if anything prints, you have
duplicates! Rethink your naming scheme so that there are no longer duplicate variables.

8

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 8
Page Mode                       : UseOutlines
Author                          : Dina Sinclair
Title                           : CSV to SAV Instructions
Subject                         : 
Creator                         : LaTeX with hyperref package
Producer                        : pdfTeX-1.40.17
Create Date                     : 2018:01:17 23:25:38Z
Modify Date                     : 2018:01:17 23:25:38Z
Trapped                         : False
PTEX Fullbanner                 : This is MiKTeX-pdfTeX 2.9.6211 (1.40.17)

EXIF Metadata provided by EXIF.tools

CSV To SAV Instructions Spss

Navigation menu

Versions of this User Manual:

Views

Navigation