PECA 2.0 Manual V3.1
User Manual:
Open the PDF directly: View PDF .
Page Count: 10
Download | ![]() |
Open PDF In Browser | View PDF |
PECA 2.0 manual Zhana Duren durenzn@gmail.com Dec 4, 2018 Contents 1. Getting started. ..................................................................................... 3 1.1 About PECA ................................................................................... 3 1.2 Installation ...................................................................................... 4 1.3 PECA software ............................................................................... 4 2. Network inference ................................................................................ 5 2.1 Prepare input data ........................................................................... 5 2.2 Run PECA network inference ........................................................ 5 3. Comparison of two networks ............................................................... 7 3.1 Prepare input data ........................................................................... 7 3.2 Run PECA network comparison .................................................... 7 4. Comparison of two groups of networks .............................................. 9 4.1 Prepare input data ........................................................................... 9 4.2 Run PECA multiple network comparison...................................... 9 1. Getting started. 1.1 About PECA The rapid increase of genome-wide data sets on gene expression, chromatin states and transcription factor (TF) binding locations offers an exciting opportunity to interpret the information encoded in genomes and epigenomes. This task can be challenging as it requires joint modeling of context specific activation of cisregulatory elements (RE) and the effects on transcription of associated regulatory factors. To meet this challenge, we propose a statistical approach based on paired expression and chromatin accessibility (PECA) data across diverse cellular contexts. In our approach, we model 1) the localization to REs of chromatin regulators (CR) based on their interaction with sequence-specific TF, 2) the activation of REs due to CRs that are localized to them, 3) the effect of TFs bound to activated REs on the transcription of target genes (TG). The transcriptional regulatory network inferred by PECA provides a detailed view of how trans- and cis-regulatory elements work together to affect gene expression in a context specific manner. PECA is a statistical tool for gene regulatory network inference from paired gene expression and chromatin accessibility data. If you use PECA software, please cite: Duren, Zhana, et al. "Modeling gene regulation from paired expression and chromatin accessibility data." Proceedings of the National Academy of Sciences 114.25 (2017): E4914-E4923. 1.2 Installation PECA is a software for inferring context specific gene regulatory network from paired gene expression and chromatin accessibility data. PECA software source code can be downloaded from Github: https://github.com/SUwonglab/PECA. To run PECA, you need to install following: Matlab, Macs2, Homer, Samtools and Bedtools. Download and install PECA on Linux: wget https://github.com/SUwonglab/PECA/archive/master.zip unzip master.zip cd PECA-master/ bash install.sh 1.3 PECA software There are three tools included in PECA 2.0 software: network inference, comparison of two networks and comparison of two groups of networks. To run network comparison, you need to run network inference first. 2. Network inference To run PECA network inference tool, you need to do following two steps: i) prepare input data and ii) run PECA network inference. 2.1 Prepare input data Put the input files into folder named ./Input. PECA network require following three input files: ${SampleName}.txt, ${SampleName}.bam, ${SampleName}.bam.bai. ${SampleName}.txt is gene expression file containing two columns (tab delimited), gene Symbol and FPKM (or TPM). ${SampleName}.bam is chromatin accessibility data, DNase-seq or ATAC-seq. ${SampleName}.bam.bai is the index file of bam file. Note that all the three files should have same before-dot-file-name ${SampleName}, only difference is after dot ".txt", ".bam" or ".bam.bai". Please see the example of RAd4 in the ./Input directory (RAd4.txt, RAd4.bam, and RAd4.bam.bai). 2.2 Run PECA network inference After the input data is prepared, please run following script do network inference. sh PECA.sh $sampleName $genome Example: sh PECA.sh RAd4 mm9 The results will be ./Results/${SampleName}/ . Please see the description of the output files: ${SampleName}_network.txt is the tissue specific network. Each row represent one regulation. The first two columns are TF and TG. The third column is regulation score. Higher value represents higher possibility of regulation. Rows are ranked by regulation score. The Forth column is FDR. The Fifth column is List of Regulatory elements (REs, including promoter, enhancers,…) which regulating TG and contain accessible motif binding site of TF . TFTG_score.txt is regulation strength for the all TF to TG. Each row represents one TF and each column represents one target gene. Higher value represents higher possibility of regulation. CRB_pval.txt is the Chromatin regulators' (CR) binding site matrix, each column represents one CR, each row represents one region, the values are p-values. 3. Comparison of two networks If you have two samples and want to compare the two samples at network level, please run network comparison tool by following steps: 3.1 Prepare input data Prepare two networks: Run PECA network inference tool on two samples one by one. (Script: sh PECA.sh $sampleName $genome ) 3.2 Run PECA network comparison 2, Run: sh PECA_compare_dif.sh $Sample1 $Sample2 $Organism Examples: sh PECA_compare_dif.sh K562 GM12878 human sh PECA_compare_dif.sh mESC RAd4 mouse Node that $Sample1 and $Sample2 must be consistent with the file names in Input directory. The results will be ./Results/Compare_${Sample1}_${Sample2}/. Please see the description of the output files: Specific network of two samples: ${Sample1}_specific_network.txt and ${Sample2}_specific_network.txt Common network of two samples: ${Sample1}_${Sample2}_common_network.txt Specific module of two networks: ${Sample2}_specific_module.txt ${Sample1}_specific_module.txt and Common module of two samples: ${Sample1}_${Sample2}_common_module.txt Files PooledNetwork.txt or PooledModuole.txt can be used to visualize the networks by Cytoscape, and the node label is given in file Node_label.txt. "1" and "-1" in PooledNetwork.txt or PooledModuole.txt represent "Activation" and "Repression" respectively. "1" and "2" in Node_label.txt represent the gene is Sample1 specific or Sample2 specific. 4. Comparison of two groups of networks If you have two conditions (multiple samples in each conditions) and want to compare the two conditions at network level, please do it by following steps: 4.1 Prepare input data 1, Run PECA network inference tool on all the samples from two conditions one by one. (Script: sh PECA.sh $sampleName $genome ) 2, Construct labels: Write the sample names of Group1 and Group2 into text files named $Group1 and $Group2, respectively. (eg. create one text file named "Control" and put the sample names of one condition to this file, create other text file named "Case" and put the names of the other condition to this file. Node that sample names must be consistent with the file names in Input directory. Note that the sample name files contain one sample name per line.) 4.2 Run PECA multiple network comparison 3, Run: sh PECA_compare_dif_multiple.sh $Group1 $Group2 $Organism Example: sh PECA_compare_dif_multiple.sh Control Case human The results will be ./Results/CompareGroup_${Group1}_${Group2} . Please see the description of the output files: Specific network of two conditions: ${Group1}_specific_network.txt and ${Group2}_specific_network.txt Common network of two conditions: ${Group1}_${Group2}_common_network.txt Specific module of two conditions: ${Group1}_specific_module.txt and ${Group2}_specific_module.txt Common module of two conditions: ${Group1}_${Group2}_common_module.txt Files PooledNetwork.txt or PooledModuole.txt can be used to visualize the network by Cytoscape, and the node label is given in file Node_label.txt. "1" and "-1" in PooledNetwork.txt or PooledModuole.txt represent "Activation" and "Repression" respectively. "1" and "2" in Node_ label.txt represent the gene is Group1 specific or Group2 specific.
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 10 Language : en-US Tagged PDF : Yes Author : Duren Zhana Creator : Microsoft® Word 2013 Create Date : 2018:12:04 17:19:51-08:00 Modify Date : 2018:12:04 17:19:51-08:00 Producer : Microsoft® Word 2013EXIF Metadata provided by EXIF.tools