Eagle Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 19

Eagle-package
AM
FPR4AM
OpenGUI
ReadMap
ReadMarker
ReadPheno
ReadZmat
SummaryAM
Index

Package ‘Eagle’

December 12, 2018

Type Package

Title Multiple Locus Association Mapping on a Genome-Wide Scale

Version 1.2.0

Maintainer Andrew George <andrew.george@csiro.au>

Author Andrew George [aut, cre],

Joshua Bowden [ctb],

Ryan Stephenson [ctb],

Hyun Kang [ctb],

Noah Zaitlen [ctb],

Claire Wade [ctb],

Andrew Kirby [ctb],

David Heckerman [ctb],

Mark Daly [ctb],

Eleazar Eskin [ctb]

Description An implementation of multiple-locus association mapping on a genome-wide scale. 'Ea-

gle' can handle inbred and outbred study populations, populations of arbitrary unknown com-

plexity, and data larger than the memory capacity of the computer. Since 'Eagle' is based on lin-

ear mixed models, it is best suited to the analysis of data on continuous traits. How-

ever, it can tolerate non-normal data. 'Eagle' reports, as its ﬁnd-

ings, the best set of snp in strongest association with a trait. For users unfamiliar with R, to per-

form an analysis, run 'OpenGUI()'. This opens a web browser to the menu-driven user inter-

face for the input of data, and for performing genome-wide analysis.

License GPL-3

Depends R (>= 3.4), shinyFiles

Imports matrixcalc, shiny, shinythemes, shinyBS, shinyjs, stats,

utils, parallel, data.table

LinkingTo RcppEigen, Rcpp

Roxygen list(wrap=FALSE)

LazyData true

ByteCompile TRUE

NeedsCompilation yes

URL http://eagle.r-forge.r-project.org

Contact eaglehelp@csiro.au

2Eagle-package

Rtopics documented:

Eagle-package........................................ 2

AM ............................................. 3

FPR4AM .......................................... 5

OpenGUI .......................................... 8

ReadMap .......................................... 9

ReadMarker......................................... 10

ReadPheno ......................................... 13

ReadZmat.......................................... 15

SummaryAM ........................................ 16

Index 19

Eagle-package Eagle for Genome-wide Association Mapping

Description

An implementation of multiple-locus association mapping on a genome-wide scale. ’Eagle’ can

handle inbred and outbred study populations, populations of arbitrary unknown complexity, and

data larger than the memory capacity of the computer. Since ’Eagle’ is based on linear mixed

models, it is best suited to the analysis of data on continuous traits. However, it can tolerate non-

normal data. ’Eagle’ reports, as its ﬁndings, the best set of snp in strongest association with a trait.

For users unfamiliar with R, to perform an analysis, run ’OpenGUI()’. This opens a web browser

to the menu-driven user interface for the input of data, and for performing genome-wide analysis.

Details

Motivation: Data from genome-wide association studies are analyzed, commonly, with single-locus

models. That is, analyzes are performed on a locus-by-locus basis. Multiple-locus approaches that

model the association between a trait and multiple loci simultaneously are more powerful. However,

these methods do not scale well with study size and many of the packages that implement these

methods are not easy to use. Eagle was speciﬁcally designed to make genome-wide association

mapping with multiple-locus models simple and practical.

Assumptions

1. Individuals are diploid but they can be inbred or outbred.

2. The marker and phenotype data are in separate ﬁles.

3. Marker loci are snps. Dominant and multi-allelic loci will need to be converted into biallelic

(snp-like) loci.

4. The trait is continuous and normally distributed. Eagle can handle non-normally distributed

trait data but there may a loss of power to detect marker-trait associations.

Important Functions:

1. ReadMarker for reading in the snp data.

2. ReadPheno for reading in the phenotypic data (traits and features/covariates)

3. ReadMap for reading in the marker map.

4. AM for performing association mapping on the data.

5. OpenGUI which opens the GUI.

AM 3

Output: The key output from AM is a list of snp. Each snp identiﬁes a separate genomic region of

interest, housing genes that are affecting the trait. Additional summary information such as the size

of the snp effects, their statistical signiﬁcance, and how much phenotypic variation they explain can

be obtained by running SummaryAM.

Where to get help: A variety of different help options are available.

• At the R prompt, type

library(, "Eagle")

for an overview of the package and its functions.

• For detailed help on a function called "foo" say, type

help("foo")

• Visit the Eagle website at http://eagle.r-forge.r-project.org/ where you can ﬁnd a

quick start guide, instructions on getting the most out of Eagle, video tutorials, and other

useful information.

Author(s)

Andrew W. George (Data61, CSIRO) with a lot of support from Joshua Bowden (IM&T, CSIRO)

Maintainer: Andrew W. George <andrew.george@csiro.au>

AM multiple-locus Association Mapping

Description

AM performs association mapping within a multiple-locus linear mixed model framework. AM ﬁnds

the best set of marker loci in strongest association with a trait while simultaneously accounting for

any ﬁxed effects and the genetic background.

Usage

AM(trait = NULL, fformula = NULL, availmemGb = 8, geno = NULL,

pheno = NULL, map = NULL, Zmat = NULL, ncpu = detectCores(),

ngpu = 0, quiet = TRUE, maxit = 20, fixit = FALSE, gamma = NULL)

Arguments

trait the name of the column in the phenotype data ﬁle that contains the trait data.

The name is case sensitive and must match exactly the column name in the

phenotype data ﬁle.

fformula the right hand side formula for the ﬁxed effects. See below for details. If not

speciﬁed, only an overall mean will be ﬁtted.

availmemGb a numeric value. It speciﬁes the amount of available memory (in Gigabytes).

This should be set to the maximum practical value of available memory for the

analysis. If not speciﬁed, 8 GBytes is assumed.

geno the R object obtained from running ReadMarker. This must be speciﬁed.

pheno the R object obtained from running ReadPheno. This must be speciﬁed.

4AM

map the R object obtained from running ReadMap. If not speciﬁed, a generic map

will be assumed.

Zmat the R object obtained from running ReadZmat. If not speciﬁed, an identity ma-

trix will be assumed.

ncpu a integer value for the number of CPU that are available for distributed comput-

ing. The default is to determine the number of CPU automatically.

ngpu a integer value for the number of gpu available for computation. The default is

to assume there are no gpu available. This option has not yet been implemented.

quiet a logical value. If set to FALSE, additional runtime output is printed. This is

useful for error checking and monitoring the progress of a large analysis.

maxit an integer value for the maximum number of forward steps to be performed.

This will rarely need adjusting.

fixit a boolean value. If TRUE, then maxit iterations are performed, regardless of the

value of the model ﬁt value extBIC. If FALSE, then the model building process

is stopped when extBIC increases in value.

gamma a value between 0 and 1 for the regularization parameter for the extBIC. Values

close to 0 lead to an anti-conservative test. Values close to 1 lead to a more

conservative test. If this value is left unspeciﬁed, a default value of 1 is assumed.

See FPR4AM for an empirical approach for setting the gamma value.

Value

A list with the following components:

trait: column name of the trait being used by ’AM’.

fformula: the ﬁxed effects part of the linear mixed model.

indxNA: a vector containing the row indexes of those individuals, whose trait and ﬁxed effects

data contain missing values and have been removed from the analysis.

Mrk: a vector with the names of the snp in strongest and signiﬁcant association with the trait.If no

loci are found to be signiﬁcant, then this component is NA.

Chr: the chromosomes on which the identiﬁed snp lie.

Pos: the map positions for the identiﬁed snp.

Indx: the column indexes in the marker ﬁle of the identiﬁed snp.

ncpu: number of cpu used for the calculations.

availmemGb: amount of RAM in gigabytes that has been set by the user.

quiet: boolean value of the parameter.

extBIC: numeric vector with the extended BIC values for the loci found to be in signiﬁcant asso-

ciation with the trait.

gamma the numeric value of the parameter.

See Also

FPR4AM ,ReadMarker,ReadPheno,ReadZmat, and ReadMap

FPR4AM 5

Examples

## Not run:

# Since the following code takes longer than 5 seconds to run, it has been tagged as dontrun.

# However, the code can be run by the user.

#-------------------------

# Example

#------------------------

# read the map

#~~~~~~~~~~~~~~

# File is a plain space separated text file with the first row

# the column headings

complete.name <- system.file('extdata','map.txt',

package='Eagle')

map_obj <- ReadMap(filename=complete.name)

# read marker data

#~~~~~~~~~~~~~~~~~~~~

# Reading in a PLINK ped file

# and setting the available memory on the machine for the reading of the data to 8 gigabytes

complete.name <- system.file('extdata','geno.ped',

package='Eagle')

geno_obj <- ReadMarker(filename=complete.name, type='PLINK', availmemGb=8)

# read phenotype data

#~~~~~~~~~~~~~~~~~~~~~~~

# Read in a plain text file with data on a single trait and two covariates

# The first row of the text file contains the column names y, cov1, and cov2.

complete.name <- system.file('extdata','pheno.txt', package='Eagle')

pheno_obj <- ReadPheno(filename=complete.name)

# Performing multiple-locus genome-wide association mapping with a model

# with fixed effects cov1 and cov2 and an intercept. The intercept

# need not be specified as it is assumed.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

res <- AM(trait = 'y',

fformula=c('cov1+cov2'),

map = map_obj,

pheno = pheno_obj,

geno = geno_obj, availmemGb=8)

## End(Not run)

FPR4AM Set the false positive rate for AM

6FPR4AM

Description

The gamma parameter in AM controls the false positive rate of the model building process. This

function uses permutation to ﬁnd the gamma value for a desired false positive rate.

Usage

FPR4AM(falseposrate = 0.05, trait = trait, numreps = 100,

fformula = NULL, availmemGb = 8, numgammas = 20, geno = NULL,

pheno = NULL, map = NULL, Zmat = NULL, ncpu = detectCores(),

ngpu = 0, seed = 101)

Arguments

falseposrate the desired false positive rate.

trait the name of the column in the phenotype data ﬁle that contains the trait data.

The name is case sensitive and must match exactly the column name in the

phenotype data ﬁle. This parameter must be speciﬁed.

numreps the number of replicates upon which to base the calculation of the false positive

rate. We have found 100 replicates to be sufﬁcient.

fformula the right hand side formula for the ﬁxed effects part of the model.

availmemGb a numeric value. It speciﬁes the amount of available memory (in Gigabytes).

This should be set to the maximum practical value of available memory for the

analysis.

numgammas the number of equidistant gamma values from 0 to 1 for which to calculate the

false positive rate of the model building process. This should not need adjusting.

geno the R object obtained from running ReadMarker. This must be speciﬁed.

pheno the R object obtained from running ReadPheno. This must be speciﬁed.

map the R object obtained from running ReadMap. If not speciﬁed, a generic map

will be assumed.

Zmat the R object obtained from running ReadZmat. If not speciﬁed, an identity ma-

trix will be assumed.

ncpu a integer value for the number of CPU that are available for distributed comput-

ing. The default is to determine the number of CPU automatically.

ngpu a integer value for the number of gpu available for computation. The default is

to assume there are no gpu available. This option has not yet been implemented.

seed a integer value for the starting seed for the permutations.

Details

The false positive rate for AM is controlled by its gamma parameter. Values close to 1 (0) decreases

(increases) the false positive rate of detecting SNP-trait associations. There is no analytical way of

setting gamma for a speciﬁed false positive rate. So we are using permutation to do this empirically.

By setting falseposrate to the desired false positive rate, this function will ﬁnd the corresponding

gamma value for AM.

A table of other gamma values for a range of false positive rates is also given.

To increase the precision of the gamma estimates, increase numreps.

FPR4AM 7

Value

A list with the following components:

numreps: the number of permutations performed.

gamma: the vector of gamma values.

falsepos: the false positive rates for the gamma values.

setgamma: the gamma value that gives a false positive rate of falseposrate

See Also

Examples

## Not run:

# Since the following code takes longer than 5 seconds to run, it has been tagged as dontrun.

# However, the code can be run by the user.

#-------------------------

# Example

#------------------------

# read the map

#~~~~~~~~~~~~~~

# File is a plain space separated text file with the first row

# the column headings

complete.name <- system.file('extdata','map.txt',

package='Eagle')

map_obj <- ReadMap(filename=complete.name)

# read marker data

#~~~~~~~~~~~~~~~~~~~~

# Reading in a PLINK ped file

# and setting the available memory on the machine for the reading of the data to 8 gigabytes

complete.name <- system.file('extdata','geno.ped',

package='Eagle')

geno_obj <- ReadMarker(filename=complete.name, type='PLINK', availmemGb=8)

# read phenotype data

#~~~~~~~~~~~~~~~~~~~~~~~

# Read in a plain text file with data on a single trait and two covariates

# The first row of the text file contains the column names y, cov1, and cov2.

complete.name <- system.file('extdata','pheno.txt', package='Eagle')

pheno_obj <- ReadPheno(filename=complete.name)

# Suppose we want to perform the AM analysis at a 5% false positive rate.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ans <- FPR4AM(falseposrate = 0.05,

trait = 'y',

8OpenGUI

fformula=c('cov1+cov2'),

map = map_obj,

pheno = pheno_obj,

geno = geno_obj)

res <- AM(trait = 'y',

fformula=c('cov1+cov2'),

map = map_obj,

pheno = pheno_obj,

geno = geno_obj,

gamma = ans$setgamma)

## End(Not run)

OpenGUI Browser-based Graphical User Interface

Description

Opens a web browser to act as a user-friendly interface to ’Eagle’

Usage

OpenGUI()

Details

OpenGUI is an easy to use web-based interface for ’Eagle’. By clicking on the navigation tabs at the

top of a page, data can be read and analysed. By using this GUI, a user can avoid having to write R

code.

Note, that even though a web browser is being used as the user interface, everything remains local

to the computer.

Examples

## Not run:

# opens a web browser

OpenGUI()

## End(Not run)

ReadMap 9

ReadMap Read map ﬁle

Description

Read in the marker map data.

Usage

ReadMap(filename = NULL, csv = FALSE, header = TRUE)

Arguments

filename contains the name of the map ﬁle. The ﬁle name needs to be in quotes. If the

ﬁle is not in the working directory, then the full path to the ﬁle is required.

csv a logical value. When TRUE, a csv ﬁle format is assumed. When FALSE, a space

separated format is assumed.

header a logical value. When TRUE, the ﬁrst row of the ﬁle contains the column head-

ings.

Details

Association mapping, unlike classical linkage mapping, does not require a map to ﬁnd marker-trait

associations. So, reading in a map ﬁle is optional. If a map ﬁle is supplied, then the marker names

from this ﬁle are used when reporting the ﬁndings from AM. If a map ﬁle is not supplied, then generic

names M1, M2, ..., are assigned to the marker loci where the number refers to the column number

in the marker ﬁle.

A space separated text ﬁle with column headings is assumed as the default input. The map ﬁle can

have three or four columns. If the map ﬁle has three columns, then it is assumed that the three

columns are the marker locus names, the chromosome number, and the map position (in any units).

If the map ﬁle has four columns as with a ’PLINK map ﬁle, then the columns are assumed to be

the marker locus names, the chromosome number, the map position in centimorgans, and the map

position in base pairs.

Missing values are allowed but not in the ﬁrst column of the ﬁle (i.e. the marker labels are not

allowed to be missing).

The order of the marker loci in this ﬁle is assumed to be the same order as the loci in the marker

data ﬁle.

The ﬁrst column of the map ﬁle is assumed to contain the marker names.

Value

a data frame is returned of the map data.

See Also

ReadMarker and ReadPheno.

10 ReadMarker

Examples

# Read in example map data from ./extdata/

# find the full location of the map data

complete.name <- system.file('extdata','map.txt', package='Eagle')

# read in map data

map_obj <- ReadMap(filename=complete.name)

# look at first few rows of the map file

head(map_obj)

ReadMarker Read marker data.

Description

A function for reading in marker data. Two types of data can be read.

Usage

ReadMarker(filename = NULL, type = "text", missing = NULL, AA = NULL,

AB = NULL, BB = NULL, availmemGb = 16, quiet = TRUE)

Arguments

filename contains the name of the marker ﬁle. The ﬁle name needs to be in quotes. If the

ﬁle is not in the working directory, then the full path to the ﬁle is required.

type specify the type of ﬁle. Choices are ’text’ (the default) and PLINK.

missing the number or character for a missing genotype in the text ﬁle. There is no need

to specify this for a PLINK ped ﬁle. Missing allele values in a PLINK ﬁle must

be coded as ’0’ or ’-’.

AA the character or number corresponding to the ’AA’ snp genotype in the marker

genotype ﬁle. This need only be speciﬁed if the ﬁle type is ’text’. If a character

then it must be in quotes.

AB the character or number corresponding to the ’AB’ snp genotype in the marker

genotype ﬁle. This need only be speciﬁed if the ﬁle type is ’text’. This can

be left unspeciﬁed if there are no heterozygous genotypes (i.e. the individuals

are inbred). Only a single heterozygous genotype is allowed (’Eagle’ does not

distinguish between ’AB’ and ’BA’). If speciﬁed and a character, it must be in

quotes.

BB the character or number corresponding to the ’BB’ snp genotype in the marker

genotype ﬁle. This need only be speciﬁed if the ﬁle type is ’text’. If a character,

then it must be in quotes.

availmemGb a numeric value. It speciﬁes the amount of available memory (in Gigabytes).

This should be set to be as large as possible for best performance.

quiet a logical value. If set to TRUE, additional runtime output is printed.

ReadMarker 11

Details

ReadMarker can handle two different types of marker data; namely, genotype data in a plain text

ﬁle, and PLINK ped ﬁles.

Reading in a plain text ﬁle containing the marker genotypes: To load a text ﬁle that contains

snp genotypes, run ReadMarker with filename set to the name of the ﬁle, and AA,AB,BB set to

the corresponding genotype values. The genotype values in the text ﬁle can be numeric, character,

or a mix of both.

We make the following assumptions

• The text ﬁle does not contain row or column headings

• The ﬁle is allowed to contain missing genotypes that have been coded according to missing

• Individuals are diploid

• The rows of the text ﬁle are the individuals and the columns are the marker loci

• The ﬁle is space separated

• The mapping of the observed genotypes in the marker ﬁle to AA,AB, and BB, remains the same

for all loci

• Individuals are outbred when AA,AB, and BB are speciﬁed and inbred when only AA, and BB

are speciﬁed

• For a text ﬁle, the same alphanumeric value is used for all missing marker genotypes. For a

PLINK ped ﬁle, the missing allele is allowed to be ’0’ or ’-’.

For example, suppose we have a space separated text ﬁle with marker genotype data collected

from ﬁve snp loci on three individuals where the snp genotype AA has been coded 0, the snp

genotype AB has been coded 1, the snp genotype BB has been coded 2, and missing genotypes

are coded as 99

0 1 2 0 2

1102 0

221199

The ﬁle is called geno.txt and is located in the directory /my/dir/.

To load these data, we would use the command

geno_obj <- ReadMarker(filename='/my/dir/geno.txt', AA=0, AB=1, BB=2, type='text', missing=99)

where the results from running the function are placed in geno_obj.

As another example, suppose we have a space separated text ﬁle with marker genotype data col-

lected from ﬁve snp loci on three individuals where the snp genotype AA has been coded a/a, the

snp genotype AB has been coded a/b, and the snp genotype BB has been coded b/b

a/a a/b b/b a/a b/b

a/b a/b a/a b/b a/a

b/b b/b a/b a/b NA

The ﬁle is called geno.txt and is located in the same directory from which R is being run (i.e. the

working directory).

To load these data, we would use the command

geno_obj <- ReadMarker(filename='geno.txt', AA='a/a', AB='a/b', BB='b/b',

type='text', missing = 'NA')

where the results from running the function are placed in geno_obj.

12 ReadMarker

Reading in a PLINK ped ﬁle: PLINK is a well known toolkit for the analysis of genome-wide

association data. See https://www.cog-genomics.org/plink2 for details.

Full details of PLINK ped ﬁles can be found https://www.cog-genomics.org/plink/1.9/

formats#ped. Brieﬂy, the PED ﬁle is a space delimited ﬁle (tabs are not allowed): the ﬁrst six

columns are mandatory:

Family ID

Individual ID

Paternal ID

Maternal ID

Sex (1=male; 2=female; other=unknown)

Phenotype

Here, these columns can be any values since ReadMarker ignores these columns.

Genotypes (column 7 onwards) can be any character (e.g. 1,2,3,4 or A,C,G,T or anything else)

except 0 which is, by default, the missing genotype character. All markers should be biallelic. All

snps must have two alleles speciﬁed. Missing alleles (i.e 0 or -) are allowed. No column headings

should be given.

As an example, suppose we have data on three individuals genotyped for four snp loci

FAM001 101 0 0 1 0 A G C C C G A A

FAM001 201 0 0 2 0 A A C T G G T A

FAM001 300 101 201 2 0 G A T T C G A T

Then to load these data, we would use the command

geno_obj <- ReadMarker(filename='PLINK.ped', type='PLINK')

where geno_obj is used by AM, and the ﬁle PLINK.ped is located in the working directory (i.e.

the directory from which R is being run).

Reading in other formats: Having ﬁrst installed the stand-alone PLINK software, it is possi-

ble to convert other ﬁle formats into PLINK ped ﬁles. See https://www.cog-genomics.org/

plink/1.9/formats for details.

For example, to convert vcf ﬁle into a PLINK ped ﬁle, at the unix prompt, use the PLINK com-

mand

PLINK --vcf filename.vcf --recode --out newfilename

and to convert a binary ped ﬁle (bed) into a ped ﬁle, use the PLINK command

PLINK --bfile filename --recode --tab --out newfilename

Value

To allow AM to handle data larger than the memory capacity of a machine, ReadMarker doesn’t

load the marker data into memory. Instead, it creates a reformatted ﬁle of the marker data and

its transpose. The object returned by ReadMarker is a list object with the elements asciifileM ,

asciifileMt, and dim_of_ascii_M which is the full ﬁle name (name and path) of the reformatted

ﬁle for the marker data, the full ﬁle name of the reformatted ﬁle for the transpose of the marker data,

and a 2 element vector with the ﬁrst element the number of individuals and the second element the

number of marker loci.

ReadPheno 13

Examples

#--------------------------------

# Example 1

#-------------------------------

# Read in the genotype data contained in the text file geno.txt

# The function system.file() gives the full file name (name + full path).

complete.name <- system.file('extdata','geno.txt', package='Eagle')

# The full path and name of the file is

print(complete.name)

# Here, 0 values are being treated as genotype AA,

# 1 values are being treated as genotype AB,

# and 2 values are being treated as genotype BB.

# 4 gigabytes of memory has been specified.

# The file is space separated with the rows the individuals

# and the columns the snp loci.

geno_obj <- ReadMarker(filename=complete.name, type='text', AA=0, AB=1, BB=2, availmemGb=4)

# view list contents of geno_obj

print(geno_obj)

#--------------------------------

# Example 2

#-------------------------------

# Read in the allelic data contained in the PLINK ped file geno.ped

# The function system.file() gives the full file name (name + full path).

complete.name <- system.file('extdata','geno.ped', package='Eagle')

# The full path and name of the file is

print(complete.name)

# Here, the first 6 columns are being ignored and the allelic

# information in columns 7 - 10002 is being converted into a reformatted file.

# 4 gigabytes of memory has been specified.

# The file is space separated with the rows the individuals

# and the columns the snp loci.

geno_obj <- ReadMarker(filename=complete.name, type='PLINK', availmemGb=4)

# view list contents of geno_obj

print(geno_obj)

ReadPheno Read phenotype ﬁle

Description

Read in the phenotype data.

14 ReadPheno

Usage

ReadPheno(filename = NULL, header = TRUE, csv = FALSE, missing = NULL)

Arguments

filename contains the name of the phenotype ﬁle. The ﬁle name needs to be in quotes. If

the ﬁle is not in the working directory, then the full path to the ﬁle is required.

header a logical value. When TRUE, the ﬁrst row of the ﬁle contains the names of the

columns. Default is TRUE.

csv a logical value. When TRUE, a csv ﬁle format is assumed. When FALSE, a space

separated format is assumed. Default is FALSE.

missing the number or character for a missing phenotype value.

Details

ReadPheno reads in the phenotype data which are data measured on traits and any ﬁxed effects (or

predictors/features/explanatory variables). A space separated plain text ﬁle is assumed. Each row

in this ﬁle corresponds to an individual. The number of rows in the phenotype ﬁle must be the same

as the number of rows in the marker data ﬁle. Also, the ordering of the individuals must be the same

in the two ﬁles. A space separated ﬁle with column headings is the default but can be changed with

the header and csv options.

The phenotype ﬁle may contain multiple traits and ﬁxed effects variables.

Missing values are allowed. Eagle is told which value should be treated as missing by setting the

missing parameter to the value.

For example, suppose we have three individuals for which we have collected data on two quantita-

tive traits (y1 and y2), and four explanatory variables (age, weight, height, and sex). The data looks

y1 y2 age weight height sex

112.02 -3.123 26 75 168.5 M

156.44 1.2 45 102 NA NA

10.3 NA 28 98 189.4 F

where the ﬁrst row has the column headings and the next three rows contain the observed data on

three individuals.

To load these data, we would use the command

pheno_obj <- ReadPheno(filename='pheno.dat', missing='NA')

where pheno.dat is the name of the phenotype ﬁle, and pheno_obj is the R object that contains the

results from reading in the phenotype data. The ﬁle is located in the working directory so there is

no need to specify the full path, just the ﬁle name is sufﬁce.

Dealing with missing trait data:

AM deals automatically with individuals with missing trait data. These individuals are removed

from the analysis and a warning message is generated.

Dealing with missing ﬁxed effects values:

AM deals automatically with individuals with missing ﬁxed effects values. These individuals are

removed from the analysis and a warning message is generated

ReadZmat 15

Value

a data frame is returned of the phenotype data. If header is true, the names of the columns will be

as speciﬁed by the ﬁrst row of the phenotype ﬁle. If header is FALSE, generic names are supplied

by R in the form of V1, V2, etc. If no column headings are given, these generic names will need to

be used in the trait and fformula parameters in AM. You can print out the column names of the

data frame by using

names(pheno_obj)

The column names are also printed along with other summary information when ReadPheno is run.

See Also

ReadMarker for reading in marker data, AM for performing association mapping.

Examples

# Read in phenotype data from ./extdata/

# find the full location of the phenotype data

complete.name <- system.file('extdata','pheno.txt', package='Eagle')

pheno_obj <- ReadPheno(filename=complete.name)

## print a couple of lines of the data file

head(pheno_obj)

ReadZmat Read Z matrix

Description

Read in the Z matrix that assigns groups/strains/lines to their trait measurements.

Usage

ReadZmat(filename = NULL)

Arguments

filename contains the name of the Z matrix ﬁle. The ﬁle name needs to be in quotes. If

the ﬁle is not in the working directory, then the full path to the ﬁle is required.

Details

The underlying linear mixed model is of the form

Y=Xβ +Zug+e

where Z is a (n x ng) matrix that contains ones and zeros, n is the number of trait measurements,

and ngis the number of groups/strains/lines. If n and ngare the same, then there is no need to

16 SummaryAM

specify Z. However, if a group/strain/line has multiple trait measurements (i.e. n > ng) then the Z

matrix is needed to tell Eagle which trait measurements belong to which groups/strains/lines.

A space separated text ﬁle is assumed. Each row of the matrix contains multiple zeroes but only a

single one. The ﬁle cannot contain column or row headings. The ﬁle also cannot contain a row of

only zeroes. Here, n must be larger than ngotherwise an error will be issued.

Value

a data matrix is returned of the Z matrix.

See Also

ReadMarker and ReadPheno.

Examples

# Read in example Z matrix from ./extdata/

# find the full location of the Z matrix data

complete.name <- system.file('extdata','Z.txt', package='Eagle')

# read in Z matrix data

Z_obj <- ReadZmat(filename=complete.name)

# look at first few rows of the Z matrix file

head(Z_obj)

SummaryAM Summary of multiple locus association mapping results

Description

A summary function that provides additional information on the signiﬁcant marker-trait associations

found by AM

Usage

SummaryAM(AMobj = NULL, pheno = NULL, geno = NULL, map = NULL)

Arguments

AMobj the (list) object obtained from running AM. Must be speciﬁed.

pheno the (data frame) object obtained from running ReadPheno. Must be speciﬁed.

geno the (list) object obtained from running ReadMarker. Must be speciﬁed.

map the (data frame) object obtained from running ReadMap. The default is to assume

a map object has not been supplied. Optional.

SummaryAM 17

Details

SummaryAM produces two tables of results. First, a table of results is produced with the additive

effect size and p-value for each ﬁxed effect in the ﬁnal model. Second, a table of results is produced

with the proportion of phenotypes variance explained by the different multiple-locus models. Each

row in this table is the proportion of phenotypic variance explained (Sun et al. 2010) after the

marker locus has been added to the multiple locus model.

References

Sun G., Zhu C., Kramer MH., Yang S-S., et al. 2010. Variation explained in mixed model associa-

tion mapping. Heredity 105, 330-340.

See Also

Examples

## Not run:

# Since the following code takes longer than 5 seconds to run, it has been tagged as dontrun.

# However, the code can be run by the user.

#---------------

# read the map

#---------------

# File is a plain space separated text file with the first row

# the column headings

complete.name <- system.file('extdata','map.txt',

package='Eagle')

map_obj <- ReadMap(filename=complete.name)

# to look at the first few rows of the map file

head(map_obj)

#------------------

# read marker data

#------------------

# Reading in a PLINK ped file

# and setting the available memory on the machine for the reading of the data to 8 gigabytes

complete.name <- system.file('extdata','geno.ped',

package='Eagle')

geno_obj <- ReadMarker(filename=complete.name, type='PLINK', availmemGb=8)

#----------------------

# read phenotype data

#-----------------------

# Read in a plain text file with data on a single trait and two fixed effects

# The first row of the text file contains the column names y, cov1, and cov2.

complete.name <- system.file('extdata','pheno.txt', package='Eagle')

pheno_obj <- ReadPheno(filename=complete.name)

#-------------------------------------------------------

18 SummaryAM

# Perform multiple-locus genome-wide association mapping

#-------------------------------------------------------

res <- AM(trait = 'y',

fformula=c("cov1 + cov2"),

map = map_obj,

pheno = pheno_obj,

geno = geno_obj, availmemGb=8)

#-----------------------------------------

# Produce additional summary information

#------------------------------------------

SummaryAM(AMobj=res, pheno=pheno_obj, geno=geno_obj, map=map_obj)

## End(Not run)

Index

AM,2,3,3,6,7,9,12,15–17

Eagle (Eagle-package),2

Eagle-package,2

FPR4AM,4,5

OpenGUI,2,8

ReadMap,2,4,6,9,16

ReadMarker,2–4,6,9,10,15,16

ReadPheno,2–4,6,9,13,16

ReadZmat,4,6,15

SummaryAM,3,16

Eagle Manual

Navigation menu

Versions of this User Manual:

Views

Navigation