Manual
User Manual:
Open the PDF directly: View PDF
.
Page Count: 10
| Download | |
| Open PDF In Browser | View PDF |
MAPTest
June 17, 2019
Title MAP Test for detection DE genes
Version 0.0.0.9000
Description MAP test for analyzing temporal gene expression data.
Depends R (>= 3.4.0)
License GPL-2
Encoding UTF-8
LazyData true
RoxygenNote 6.1.1
Imports EQL,MASS,foreach,mclust,mvtnorm,parallel,randtoolbox,stats,matlib
R topics documented:
data_generation
estimation_fun
MAP_test . . .
Summary_MAP
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Index
data_generation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
3
5
7
10
Simulate MAP data set:
Description
Simulate MAP data set:
Usage
data_generation(G = 100, n_control = 10, n_treat = 10, n_rep = 3,
k_real = 4, sigma2_r = rep(1, 2), sigma1_2_r = 1,
sigma2_2_r = c(3, 2), mu1_r = 4, phi_g_r = rep(1, 100),
p_k_real = c(0.7, 0.1, 0.1, 0.1), x = x)
1
2
data_generation
Arguments
G
Number of genes to simulate
n_control
Number of time points in control group
n_treat
Number of time points in treatment group
n_rep
Number of replicates in both group
k_real
Always be 4
sigma2_r
Variance parameter for τg
sigma1_2_r
Variance parameter for ηg1
sigma2_2_r
Variance parameter for ηg2
mu1_r
Mean parameter for ηg1
phi_g_r
Dispersion parameter
p_k_real
True proportion for each mixture component
x
Time structured design for the simulated data
Details
The vector of read counts for gene g, treatment group i, replicate j, at time point t, Ygij (t), follows a
Negative Binomial distribution parameterized mean λgi and φg , where E[Ygij (t)] = λgi (t). λgi (t)
is further modeled as λgi (t) = Sij exp[ηg1 Ii=2 + B 0 (t)ηg2 Ii=2 + B 0 (t)τg ]. We have B 0 (t) are
design matrix, which is constructed by 2 orthorgonal polynomial bases.
• t = 1,..., n_treat (or n_control if control group);
• j = 1,..., n_rep;
• g = 1,...,G; and
• [ηg1 , ηg2 , τg ] ~ 4-component gausssian mixture model
Value
• Y1 Simulated data
Examples
library(matlib)
n_basis = 2
n_control = 10
n_treat
= 10
n_rep = 3
tt_treat = c(1:n_treat)/n_treat
nt = length(tt_treat)
ind_t = sort(sample(c(1:nt), n_control))
tt = tt_treat[ind_t]
tttt = c(rep(tt, n_rep), rep(tt_treat, n_rep))
z = x = matrix(0, length(tttt), n_basis)
z[,1] = 1.224745*tttt
z[,2] = -0.7905694 + 2.371708*tttt^2
x[,1] = z[,1] - Proj(z[,1], rep(1, length(tttt)))
x[,2] = z[,2] - Proj(z[,2], rep(1, length(tttt))) - Proj(z[,2], x[,1])
Y1 = data_generation(G = 100,
n_control = n_control,
n_treat
= n_treat,
estimation_fun
3
n_rep
= n_rep,
k_real = 4,
sigma2_r = rep(1, 2),
sigma1_2_r = 1,
sigma2_2_r = c(3,2),
mu1_r = 4,
phi_g_r = rep(1, 100),
p_k_real = c(0.7, 0.1, 0.1, 0.1),
x = x)
Estimation for MAP data set:
estimation_fun
Description
Estimation for MAP data set:
Usage
estimation_fun(n_control = 10, n_treat = 10, n_rep = 3, x, Y1,
k = 4, nn = 300, phi = NULL, type = NULL, tttt)
Arguments
n_control
Number of time points in control group
n_treat
Number of time points in treatment group
n_rep
Number of replicates in both group
x
Time structured design for the simulated data
Y1
RNA-seq data set: G by (n_control + n_treat) * n_rep matrix
k
Always 4
nn
Number of QMC nodes used in likelihood estimation
phi
the dispersion parameter
type
dispersion parameter estimation type
tttt
Time points used in the design
Details
The vector of read counts for gene g, treatment group i, replicate j, at time point t, Ygij (t), follows a
Negative Binomial distribution parameterized mean λgi and φg , where E[Ygij (t)] = λgi (t). λgi (t)
is further modeled as λgi (t) = Sij exp[ηg1 Ii=2 + B 0 (t)ηg2 Ii=2 + B 0 (t)τg ]. We have B 0 (t) are
design matrix, which is constructed by 2 orthorgonal polynomial bases.
• t = 1,..., n_treat (or n_control if control group);
• j = 1,..., n_rep;
• g = 1,...,G; and
• [ηg1 , ηg2 , τg ] ~ 4-component gausssian mixture model. We used latented negative binomial
model with EM algorithm to estimate the paramters of mixture model.
4
estimation_fun
Value
A list of parameters that we need to use for DE analysis.
data_use List includes the data information
• Y1 RNA-seq data set: G by (n_control + n_treat) * n_rep matrix
• start Starting point of the EM algorithm
• k always 4
• n_basis Number of basis function have been used to construct the design matrix
• X1 Vector indicate control data (0) or treatment data (1)
• x Design matrix been used for estimation
• tttt Time points used in the design
result1 List includes the Parameters estimations
• mu1 Mean parameter for ηg1
• sigma1_2 Variance parameter for ηg1
• sigma2 Variance parameter for τg
• sigma2_2 Variance parameter for ηg2
• p_k Proportion for each mixture component
• aa Dispertion parameter
Examples
library(matlib)
n_basis = 2
n_control = 10
n_treat
= 10
n_rep = 3
tt_treat = c(1:n_treat)/n_treat
nt = length(tt_treat)
ind_t = sort(sample(c(1:nt), n_control))
tt = tt_treat[ind_t]
tttt = c(rep(tt, n_rep), rep(tt_treat, n_rep))
z = x = matrix(0, length(tttt), n_basis)
z[,1] = 1.224745*tttt
z[,2] = -0.7905694 + 2.371708*tttt^2
x[,1] = z[,1] - Proj(z[,1], rep(1, length(tttt)))
x[,2] = z[,2] - Proj(z[,2], rep(1, length(tttt))) - Proj(z[,2], x[,1])
Y1 = data_generation(G = 100,
n_control = n_control,
n_treat
= n_treat,
n_rep
= n_rep,
k_real = 4,
sigma2_r = rep(1, 2),
sigma1_2_r = 1,
sigma2_2_r = c(3,2),
mu1_r = 4,
phi_g_r = rep(1,100),
p_k_real = c(0.7, 0.1, 0.1, 0.1),
x = x)
MAP_test
5
aaa <- proc.time()
est_result <- estimation_fun(n_control = n_control,
n_treat = n_treat,
n_rep = n_rep,
x = x,
Y1 = Y1,
nn = 300,
k = 4,
phi = NULL,
type = 2,
tttt = tttt)
aaa1 <- proc.time()
aaa1 - aaa
#------------gaussian basis construction-----------------# method <- "gaussian"
# n_basis <- 2
# a = 1/4
# b = 2
# c = sqrt(a^2 + 2 * a * b)
# A = a + b + c
# B = b/A
# phi_cal = function(k, x){
# Lambda_k =sqrt(2 * a / A) * B^k
# 1/(sqrt(sqrt(a/c) * 2^k * gamma(k+1))) *
#
exp(-(c-a) * x^2) * hermite(sqrt(2 * c) * x, k, prob = F)
#
# }
# z = do.call(cbind, lapply(c(1:n_basis), phi_cal, x = (tttt - mean(tttt))))
# x = matrix(0, length(tttt), n_basis)
# if(n_basis == 2){
# x[,1] = z[,1] - Proj(z[,1], rep(1, length(tttt)))
# x[,2] = z[,2] - Proj(z[,2], rep(1, length(tttt))) - Proj(z[,2], x[,1])
#
# }else{
# x[,1] = z[,1] - Proj(z[,1], rep(1, length(tttt)))
# x[,2] = z[,2] - Proj(z[,2], rep(1, length(tttt))) - Proj(z[,2], x[,1])
# x[,3] = z[,3] - Proj(z[,3], rep(1, length(tttt))) - Proj(z[,3], x[,1]) - Proj(z[,3], x[,2])
# }
MAP_test
MAP test
Description
MAP test
Usage
MAP_test(est_result, dd = NULL, Type = c(1:6), nn = 6000)
Arguments
est_result
estimation_fun result:
A List contains data_use (the data information) and result1 (Parameters estimations)
6
MAP_test
dd
True index to check the true FDR
Type
Type of hypothesis
• Type = 1 general DE detection
• Type = 2 PDE with no time-by-treatment and NPDE with both treatment
and time-by-treatment
• Type = 3 NPDE with or without time-by-treatment
• Type = 4 PDE with no time-by-treatment
• Type = 5 NPDE with only time-by-treatment
• Type = 6 NPDE with both treatment and time-by-treatment
Number of QMC nodes used to estimate the test statistics
nn
Details
The vector of read counts for gene g, treatment group i, replicate j, at time point t, Ygij (t), follows a
Negative Binomial distribution parameterized mean λgi and φg , where E[Ygij (t)] = λgi (t). λgi (t)
is further modeled as λgi (t) = Sij exp[ηg1 Ii=2 + B 0 (t)ηg2 Ii=2 + B 0 (t)τg ]. We have B 0 (t) are
design matrix, which is constructed by 2 orthorgonal polynomial bases.
• t = 1,..., n_treat (or n_control if control group);
• j = 1,..., n_rep;
• g = 1,...,G; and
• [ηg1 , ηg2 , τg ] ~ 4-component gausssian mixture model. After the parameter estimation we run
the MAP test to detect the DE genes.
Value
A list of results for DE analysis.
• Type Type of hypothesis of interest
• FDR_final FDR estimation of each hypothesis
• FDR_real The true FDR if dd is known
• power Power if dd is known
• T_x Likelihood result for each component
• test_stat Test statistics for each hypothesis
• ct_final_all Test statistics cut-off value at each estimated FDR level
Examples
library(matlib)
n_basis = 2
n_control = 10
n_treat
= 10
n_rep = 3
tt_treat = c(1:n_treat)/n_treat
nt = length(tt_treat)
ind_t = sort(sample(c(1:nt), n_control))
tt = tt_treat[ind_t]
tttt = c(rep(tt, n_rep), rep(tt_treat, n_rep))
z = x = matrix(0, length(tttt), n_basis)
Summary_MAP
7
z[,1] = 1.224745*tttt
z[,2] = -0.7905694 + 2.371708*tttt^2
x[,1] = z[,1] - Proj(z[,1], rep(1, length(tttt)))
x[,2] = z[,2] - Proj(z[,2], rep(1, length(tttt))) - Proj(z[,2], x[,1])
Y1 = data_generation(G = 100,
n_control = n_control,
n_treat = n_treat,
n_rep
= n_rep,
k_real = 4,
sigma2_r = rep(1, 2),
sigma1_2_r = 1,
sigma2_2_r = c(3,2),
mu1_r = 4,
phi_g_r = rep(1,100),
p_k_real = c(0.7, 0.1, 0.1, 0.1),
x = x)
aaa <- proc.time()
est_result <- estimation_fun(n_control = n_control,
n_treat = n_treat,
n_rep = n_rep,
x = x,
Y1 = Y1,
nn = 300,
k = 4,
phi = NULL,
type = 2,
tttt = tttt)
aaa1 <- proc.time()
aaa1 - aaa
G <- 100
k_real <- 4
p_k_real <- c(0.7, 0.1, 0.1, 0.1)
dd = rep(c(0:(k_real-1)), p_k_real * G)
result = MAP_test(est_result = est_result, Type = c(1:6), dd = dd, nn = 300)
Summary for MAP Test
Summary_MAP
Description
Summary for MAP Test
Usage
Summary_MAP(test_result, alpha = 0.05)
Arguments
test_result
MAP_test result:
A list of results for DE analysis
alpha
Nominal level of FDR
8
Summary_MAP
Details
The vector of read counts for gene g, treatment group i, replicate j, at time point t, Ygij (t), follows a
Negative Binomial distribution parameterized mean λgi and φg , where E[Ygij (t)] = lambdagi (t).
λgi (t) is further modeled as λgi (t) = Sij exp[ηg1 Ii=2 + B 0 (t)ηg2 Ii=2 + B 0 (t)τg ] We have B’(t) are
design matrix, which is constructed by 2 orthorgonal polynomial bases.
• t = 1,..., n_treat (or n_control if control group);
• j = 1,..., n_rep;
• g = 1,...,G; and
• [ηg1 , ηg2 , τg ] ~ 4-component gausssian mixture model. After the parameter estimation we run
the MAP test to detect the DE genes. At given nominal level, a list of differential genes are
returned at a given hypothesis.
Value
A list of results for DE analysis.
• Type Type of hypothesis
• Reject_index DE genes list
• FDR_hat Estimated FDR
Examples
library(matlib)
set.seed(1)
n_basis = 2
n_control = 10
n_treat
= 10
n_rep = 3
tt_treat = c(1:n_treat)/n_treat
nt = length(tt_treat)
ind_t = sort(sample(c(1:nt), n_control))
tt = tt_treat[ind_t]
tttt = c(rep(tt, n_rep), rep(tt_treat, n_rep))
z = x = matrix(0, length(tttt), n_basis)
z[,1] = 1.224745*tttt
z[,2] = -0.7905694 + 2.371708*tttt^2
x[,1] = z[,1] - Proj(z[,1], rep(1, length(tttt)))
x[,2] = z[,2] - Proj(z[,2], rep(1, length(tttt))) - Proj(z[,2], x[,1])
Y1 = data_generation(G = 100,
n_control = n_control,
n_treat
= n_treat,
n_rep
= n_rep,
k_real = 4,
sigma2_r = rep(1, 2),
sigma1_2_r = 1,
sigma2_2_r = c(3,2),
mu1_r = 4,
phi_g_r = rep(1,100),
p_k_real = c(0.7, 0.1, 0.1, 0.1),
x = x)
aaa <- proc.time()
est_result <- estimation_fun(n_control = n_control,
Summary_MAP
9
n_treat = n_treat,
n_rep = n_rep,
x = x,
Y1 = Y1,
nn = 300,
k = 4,
phi = NULL,
type = 2,
tttt = tttt)
aaa1 <- proc.time()
aaa1 - aaa
G <- 100
k_real <- 4
p_k_real <- c(0.7, 0.1, 0.1, 0.1)
dd = rep(c(0:(k_real-1)), p_k_real * G)
result = MAP_test(est_result = est_result, Type = c(1:6), dd = dd, nn = 300)
ss <- Summary_MAP(result)
Index
data_generation, 1
estimation_fun, 3
MAP_test, 5
Summary_MAP, 7
10
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 10 Page Mode : UseOutlines Author : Title : Subject : Creator : LaTeX with hyperref package Producer : pdfTeX-1.40.16 Create Date : 2019:06:17 13:05:04-06:00 Modify Date : 2019:06:17 13:05:04-06:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015) kpathsea version 6.2.1EXIF Metadata provided by EXIF.tools