PECA 2.0 Manual V3.1

User Manual:

Open the PDF directly: View PDF .
Page Count: 10

PECA 2.0 manual

Zhana Duren

durenzn@gmail.com

Dec 4, 2018

Contents

1. Getting started. ..................................................................................... 3

1.1 About PECA ................................................................................... 3

1.2 Installation ...................................................................................... 4

1.3 PECA software ............................................................................... 4

2. Network inference ................................................................................ 5

2.1 Prepare input data ........................................................................... 5

2.2 Run PECA network inference ........................................................ 5

3. Comparison of two networks ............................................................... 7

3.1 Prepare input data ........................................................................... 7

3.2 Run PECA network comparison .................................................... 7

4. Comparison of two groups of networks .............................................. 9

4.1 Prepare input data ........................................................................... 9

4.2 Run PECA multiple network comparison ...................................... 9

1. Getting started.

1.1 About PECA

The rapid increase of genome-wide data sets on gene expression, chromatin states

and transcription factor (TF) binding locations offers an exciting opportunity to

interpret the information encoded in genomes and epigenomes. This task can be

challenging as it requires joint modeling of context specific activation of cis-

regulatory elements (RE) and the effects on transcription of associated regulatory

factors. To meet this challenge, we propose a statistical approach based on paired

expression and chromatin accessibility (PECA) data across diverse cellular contexts.

In our approach, we model 1) the localization to REs of chromatin regulators (CR)

based on their interaction with sequence-specific TF, 2) the activation of REs due to

CRs that are localized to them, 3) the effect of TFs bound to activated REs on the

transcription of target genes (TG). The transcriptional regulatory network inferred

by PECA provides a detailed view of how trans- and cis-regulatory elements work

together to affect gene expression in a context specific manner.

PECA is a statistical tool for gene regulatory network inference from paired gene

expression and chromatin accessibility data. If you use PECA software, please cite:

Duren, Zhana, et al. "Modeling gene regulation from paired expression and

chromatin accessibility data." Proceedings of the National Academy of Sciences

114.25 (2017): E4914-E4923.

1.2 Installation

PECA is a software for inferring context specific gene regulatory network from

paired gene expression and chromatin accessibility data. PECA software source code

can be downloaded from Github: https://github.com/SUwonglab/PECA.

To run PECA, you need to install following:

Matlab, Macs2, Homer, Samtools and Bedtools.

Download and install PECA on Linux:

wget https://github.com/SUwonglab/PECA/archive/master.zip

unzip master.zip

cd PECA-master/

bash install.sh

1.3 PECA software

There are three tools included in PECA 2.0 software: network inference, comparison

of two networks and comparison of two groups of networks. To run network

comparison, you need to run network inference first.

2. Network inference

To run PECA network inference tool, you need to do following two steps: i) prepare

input data and ii) run PECA network inference.

2.1 Prepare input data

Put the input files into folder named ./Input. PECA network require following three

input files: ${SampleName}.txt, ${SampleName}.bam, ${SampleName}.bam.bai.

${SampleName}.txt is gene expression file containing two columns (tab delimited),

gene Symbol and FPKM (or TPM). ${SampleName}.bam is chromatin accessibility

data, DNase-seq or ATAC-seq. ${SampleName}.bam.bai is the index file of bam file.

Note that all the three files should have same before-dot-file-name ${SampleName},

only difference is after dot ".txt", ".bam" or ".bam.bai". Please see the example of

RAd4 in the ./Input directory (RAd4.txt, RAd4.bam, and RAd4.bam.bai).

2.2 Run PECA network inference

After the input data is prepared, please run following script do network inference.

sh PECA.sh $sampleName $genome

Example: sh PECA.sh RAd4 mm9

The results will be ./Results/${SampleName}/ . Please see the description of the

output files:

${SampleName}_network.txt is the tissue specific network. Each row represent one

regulation. The first two columns are TF and TG. The third column is regulation

score. Higher value represents higher possibility of regulation. Rows are ranked by

regulation score. The Forth column is FDR. The Fifth column is List of Regulatory

elements (REs, including promoter, enhancers,…) which regulating TG and contain

accessible motif binding site of TF .

TFTG_score.txt is regulation strength for the all TF to TG. Each row represents one

TF and each column represents one target gene. Higher value represents higher

possibility of regulation.

CRB_pval.txt is the Chromatin regulators' (CR) binding site matrix, each column

represents one CR, each row represents one region, the values are p-values.

3. Comparison of two networks

If you have two samples and want to compare the two samples at network level,

please run network comparison tool by following steps:

3.1 Prepare input data

Prepare two networks: Run PECA network inference tool on two samples one by

one. (Script: sh PECA.sh $sampleName $genome )

3.2 Run PECA network comparison

2, Run: sh PECA_compare_dif.sh $Sample1 $Sample2 $Organism

Examples: sh PECA_compare_dif.sh K562 GM12878 human

sh PECA_compare_dif.sh mESC RAd4 mouse

Node that $Sample1 and $Sample2 must be consistent with the file names in Input

directory.

The results will be ./Results/Compare_${Sample1}_${Sample2}/. Please see the

description of the output files:

Specific network of two samples: ${Sample1}_specific_network.txt and

${Sample2}_specific_network.txt

Common network of two samples: ${Sample1}_${Sample2}_common_network.txt

Specific module of two networks: ${Sample1}_specific_module.txt and

${Sample2}_specific_module.txt

Common module of two samples: ${Sample1}_${Sample2}_common_module.txt

Files PooledNetwork.txt or PooledModuole.txt can be used to visualize the networks

by Cytoscape, and the node label is given in file Node_label.txt. "1" and "-1" in

PooledNetwork.txt or PooledModuole.txt represent "Activation" and "Repression"

respectively. "1" and "2" in Node_label.txt represent the gene is Sample1 specific or

Sample2 specific.

4. Comparison of two groups of networks

If you have two conditions (multiple samples in each conditions) and want to

compare the two conditions at network level, please do it by following steps:

4.1 Prepare input data

1, Run PECA network inference tool on all the samples from two conditions one by

one. (Script: sh PECA.sh $sampleName $genome )

2, Construct labels: Write the sample names of Group1 and Group2 into text files

named $Group1 and $Group2, respectively. (eg. create one text file named

"Control" and put the sample names of one condition to this file, create other text

file named "Case" and put the names of the other condition to this file. Node that

sample names must be consistent with the file names in Input directory. Note that

the sample name files contain one sample name per line.)

4.2 Run PECA multiple network comparison

3, Run: sh PECA_compare_dif_multiple.sh $Group1 $Group2 $Organism

Example： sh PECA_compare_dif_multiple.sh Control Case human

The results will be ./Results/CompareGroup_${Group1}_${Group2} . Please see the

description of the output files:

Specific network of two conditions: ${Group1}_specific_network.txt and

${Group2}_specific_network.txt

Common network of two conditions: ${Group1}_${Group2}_common_network.txt

Specific module of two conditions: ${Group1}_specific_module.txt and

${Group2}_specific_module.txt

Common module of two conditions: ${Group1}_${Group2}_common_module.txt

Files PooledNetwork.txt or PooledModuole.txt can be used to visualize the network

by Cytoscape, and the node label is given in file Node_label.txt. "1" and "-1" in

PooledNetwork.txt or PooledModuole.txt represent "Activation" and "Repression"

respectively. "1" and "2" in Node_ label.txt represent the gene is Group1 specific or

Group2 specific.

PECA 2.0 Manual V3.1

Navigation menu

Versions of this User Manual:

Views

Navigation