PECA 2.0 Manual V3.1

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 10

PECA 2.0 manual
Zhana Duren
durenzn@gmail.com
Dec 4, 2018
Contents
1. Getting started. ..................................................................................... 3
1.1 About PECA ................................................................................... 3
1.2 Installation ...................................................................................... 4
1.3 PECA software ............................................................................... 4
2. Network inference ................................................................................ 5
2.1 Prepare input data ........................................................................... 5
2.2 Run PECA network inference ........................................................ 5
3. Comparison of two networks ............................................................... 7
3.1 Prepare input data ........................................................................... 7
3.2 Run PECA network comparison .................................................... 7
4. Comparison of two groups of networks .............................................. 9
4.1 Prepare input data ........................................................................... 9
4.2 Run PECA multiple network comparison ...................................... 9
1. Getting started.
1.1 About PECA
The rapid increase of genome-wide data sets on gene expression, chromatin states
and transcription factor (TF) binding locations offers an exciting opportunity to
interpret the information encoded in genomes and epigenomes. This task can be
challenging as it requires joint modeling of context specific activation of cis-
regulatory elements (RE) and the effects on transcription of associated regulatory
factors. To meet this challenge, we propose a statistical approach based on paired
expression and chromatin accessibility (PECA) data across diverse cellular contexts.
In our approach, we model 1) the localization to REs of chromatin regulators (CR)
based on their interaction with sequence-specific TF, 2) the activation of REs due to
CRs that are localized to them, 3) the effect of TFs bound to activated REs on the
transcription of target genes (TG). The transcriptional regulatory network inferred
by PECA provides a detailed view of how trans- and cis-regulatory elements work
together to affect gene expression in a context specific manner.
PECA is a statistical tool for gene regulatory network inference from paired gene
expression and chromatin accessibility data. If you use PECA software, please cite:
Duren, Zhana, et al. "Modeling gene regulation from paired expression and
chromatin accessibility data." Proceedings of the National Academy of Sciences
114.25 (2017): E4914-E4923.
1.2 Installation
PECA is a software for inferring context specific gene regulatory network from
paired gene expression and chromatin accessibility data. PECA software source code
can be downloaded from Github: https://github.com/SUwonglab/PECA.
To run PECA, you need to install following:
Matlab, Macs2, Homer, Samtools and Bedtools.
Download and install PECA on Linux:
wget https://github.com/SUwonglab/PECA/archive/master.zip
unzip master.zip
cd PECA-master/
bash install.sh
1.3 PECA software
There are three tools included in PECA 2.0 software: network inference, comparison
of two networks and comparison of two groups of networks. To run network
comparison, you need to run network inference first.
2. Network inference
To run PECA network inference tool, you need to do following two steps: i) prepare
input data and ii) run PECA network inference.
2.1 Prepare input data
Put the input files into folder named ./Input. PECA network require following three
input files: ${SampleName}.txt, ${SampleName}.bam, ${SampleName}.bam.bai.
${SampleName}.txt is gene expression file containing two columns (tab delimited),
gene Symbol and FPKM (or TPM). ${SampleName}.bam is chromatin accessibility
data, DNase-seq or ATAC-seq. ${SampleName}.bam.bai is the index file of bam file.
Note that all the three files should have same before-dot-file-name ${SampleName},
only difference is after dot ".txt", ".bam" or ".bam.bai". Please see the example of
RAd4 in the ./Input directory (RAd4.txt, RAd4.bam, and RAd4.bam.bai).
2.2 Run PECA network inference
After the input data is prepared, please run following script do network inference.
sh PECA.sh $sampleName $genome
Example: sh PECA.sh RAd4 mm9
The results will be ./Results/${SampleName}/ . Please see the description of the
output files:
${SampleName}_network.txt is the tissue specific network. Each row represent one
regulation. The first two columns are TF and TG. The third column is regulation
score. Higher value represents higher possibility of regulation. Rows are ranked by
regulation score. The Forth column is FDR. The Fifth column is List of Regulatory
elements (REs, including promoter, enhancers,…) which regulating TG and contain
accessible motif binding site of TF .
TFTG_score.txt is regulation strength for the all TF to TG. Each row represents one
TF and each column represents one target gene. Higher value represents higher
possibility of regulation.
CRB_pval.txt is the Chromatin regulators' (CR) binding site matrix, each column
represents one CR, each row represents one region, the values are p-values.
3. Comparison of two networks
If you have two samples and want to compare the two samples at network level,
please run network comparison tool by following steps:
3.1 Prepare input data
Prepare two networks: Run PECA network inference tool on two samples one by
one. (Script: sh PECA.sh $sampleName $genome )
3.2 Run PECA network comparison
2, Run: sh PECA_compare_dif.sh $Sample1 $Sample2 $Organism
Examples: sh PECA_compare_dif.sh K562 GM12878 human
sh PECA_compare_dif.sh mESC RAd4 mouse
Node that $Sample1 and $Sample2 must be consistent with the file names in Input
directory.
The results will be ./Results/Compare_${Sample1}_${Sample2}/. Please see the
description of the output files:
Specific network of two samples: ${Sample1}_specific_network.txt and
${Sample2}_specific_network.txt
Common network of two samples: ${Sample1}_${Sample2}_common_network.txt
Specific module of two networks: ${Sample1}_specific_module.txt and
${Sample2}_specific_module.txt
Common module of two samples: ${Sample1}_${Sample2}_common_module.txt
Files PooledNetwork.txt or PooledModuole.txt can be used to visualize the networks
by Cytoscape, and the node label is given in file Node_label.txt. "1" and "-1" in
PooledNetwork.txt or PooledModuole.txt represent "Activation" and "Repression"
respectively. "1" and "2" in Node_label.txt represent the gene is Sample1 specific or
Sample2 specific.
4. Comparison of two groups of networks
If you have two conditions (multiple samples in each conditions) and want to
compare the two conditions at network level, please do it by following steps:
4.1 Prepare input data
1, Run PECA network inference tool on all the samples from two conditions one by
one. (Script: sh PECA.sh $sampleName $genome )
2, Construct labels: Write the sample names of Group1 and Group2 into text files
named $Group1 and $Group2, respectively. (eg. create one text file named
"Control" and put the sample names of one condition to this file, create other text
file named "Case" and put the names of the other condition to this file. Node that
sample names must be consistent with the file names in Input directory. Note that
the sample name files contain one sample name per line.)
4.2 Run PECA multiple network comparison
3, Run: sh PECA_compare_dif_multiple.sh $Group1 $Group2 $Organism
Example sh PECA_compare_dif_multiple.sh Control Case human
The results will be ./Results/CompareGroup_${Group1}_${Group2} . Please see the
description of the output files:
Specific network of two conditions: ${Group1}_specific_network.txt and
${Group2}_specific_network.txt
Common network of two conditions: ${Group1}_${Group2}_common_network.txt
Specific module of two conditions: ${Group1}_specific_module.txt and
${Group2}_specific_module.txt
Common module of two conditions: ${Group1}_${Group2}_common_module.txt
Files PooledNetwork.txt or PooledModuole.txt can be used to visualize the network
by Cytoscape, and the node label is given in file Node_label.txt. "1" and "-1" in
PooledNetwork.txt or PooledModuole.txt represent "Activation" and "Repression"
respectively. "1" and "2" in Node_ label.txt represent the gene is Group1 specific or
Group2 specific.

Navigation menu