TAPES Manual
User Manual:
Open the PDF directly: View PDF .
Page Count: 14
Download | |
Open PDF In Browser | View PDF |
TAPES - INSTRUCTION MANUAL TAPES: a Tool for Assessment and Prioritisation in Exome Studies, is a script written in python 3.7 which serves three purposes: 1. Be a simplified interface to ANNOVAR (http://annovar.openbioinformatics.org/en/latest/) with easy database management and easy commands for annotation. 2. Prioritize variants using the ACMG 2015 (DOI: 10.1038/gim.2015.30) criteria and probability of pathogenicity to classify variants from pathogenic to benign. 3. Create appropriate reports for researcher based on relevant criteria. TAPES focuses on multi-sample VCFs files and disease cohorts but any file annotated with ANNOVAR can be used. COMPATIBILITY TAPES main function: sort will work on both UNIX and Windows. The ANNOVAR interface will only work on UNIX due to ANNOVAR compatibility. TAPES was written and tested on python3.7 and will work on any python3 version. Table of Contents INSTALLATION .............................................................................................................................................................. 2 RUNNING A PRIORITISATION JOB .............................................................................................................................. 2 Prioritise the annotated file ................................................................................................................................... 2 FOLDER MODE ................................................................................................................................................................. 3 CSV/TXT+XLSX MODE ....................................................................................................................................... 3 Sorting and reporting options .............................................................................................................................. 3 Output explained ................................................................................................................................................... 4 ANNOVAR INTERFACE ................................................................................................................................................. 6 First Use ................................................................................................................................................................. 6 Simplified ................................................................................................................................................................ 6 DOWNLOADING DATABASES ...................................................................................................................................... 6 ANNOTATING VCF FILE .................................................................................................................................................. 7 Advanced ............................................................................................................................................................... 7 DATABASE MANAGMENT ............................................................................................................................................... 7 ANNOTATION ..................................................................................................................................................... 8 ANNOTATION OPTIONS .................................................................................................................................... 8 DECOMPOSING VCF .................................................................................................................................................... 8 RE-ANALYSING TAPES OUTPUTS ................................................................................................................................. 8 APPENDIX .................................................................................................................................................................... 9 KEGG Pathways keys ............................................................................................................................................. 9 EnrichR Libraries .................................................................................................................................................... 11 ACMG Criteria assignment .................................................................................................................................... 13 1 1) INSTALLATION TAPES does not require installation, just download the repository at https://github.com/a-xavier/tapes and extract it to any convenient location. If pip is not installed on your system you can install it easily: Install PIP On Debian/Ubuntu apt install python3-pip Install PIP on Fedora dnf install python3-pip Install PIP on Arch Linux pacman -S python-pip Install PIP on Windows First install python3 from https://www.python.org/downloads/ and add python and pip to your path in the environment variable menu. (On windows 7 : Control Panel -> System - > Advanced System Settings -> Environment variables then under System Variables double click on path and add the python installation path separated by a semicolon “ ; ” ) Then use either cmd.exe or Windows Powershell to use TAPES. Using pip you can install all the requirement with: cd path/to/TAPES pip install --upgrade -r requirements.txt This will install all the required python modules. Note that since TAPES is written in python 3, you might need to run it using python3 instead of python depending on your system. If you plan on using TAPES as an ANNOVAR wrapper, please download ANNOVAR first here : http://annovar.openbioinformatics.org/en/latest/ 2) RUNNING A PRIORITISATION JOB 1) Prioritise the annotated file To prioritise your variant use the sort option. There is two main output mode: FOLDER and CSV/TXT+XLSX When writing the output, just specify a folder or a csv file to choose the mode (see examples below). In both mode, the flag --acmg can be added. 2 Using the --acmg tag will ensure all the main annotations for ACMG classification are present before the sorting process. If you are not sure your annotated file in fully compliant with TAPES, you can remove the –acmg tag. If the --acmg tag is not present, TAPES will annotate as much as it can based on the present annotation. This ensures that even older files annotated with ANNOVAR can be prioritised to a certain extent. a) FOLDER MODE This mode will output a folder with different csv files and figures based on the options: python tapes.py sort -i /path/to/annotated/file.csv –o/path/to/output/folder/ Will output csv files python tapes.py sort -i /to/annotated/file.csv –o /to/output/folder/ --tab Will output tab-separated files The output must be either an empty folder or a non-existent folder. b) CSV/TXT+XLSX MODE This mode will output a csv file and an xlsx report containing different spreadsheets based on the options: python tapes.py sort -i /path/to/annotated/file.csv -o /path/to/output.csv will output csv + xlsx files python tapes.py sort -i /path/to/annotated/file.csv -o /path/to/output.txt will output a tab-separated txt + xlsx files (.tsv also works) 2) Sorting and reporting Options Option Type Description --acmg flag Perform check for main annotations before sorting --trio Path to txt file A trio text file (see specification) Default --by_sample flag Create output with the 5 most pathogenic variants per sample --enrichr str Use enrichr to analayse the pathways impacted by pathogenic variants GO_Biological_Process_2018 --disease str Check in the ‘disease’ column the presence of a term cancer --list str or path to txt file A list of gene of interest (in quotes separated by a space) or a text file with one gene symbol per line --kegg str Similar to list but when you do not know all genes of interest. Select a pathways and a report will be created with only genes involved in that pathway (see Appendix for the full list of available pathways) --by_gene flag Create output ranking each gene based on a simple Gene-burden metrics 3 Notes on --trio : The trio file must be a tab delimited file with the following info: Role in family: m f o in no particular order for mother, father and offspring Trio id: any string without space Sample name: as they appear on the original vcf file Only use UNIQUE trio IDs; if there are several trios in one family, use different IDs or the result will be incorrect. Note on --by_gene flag: The --by_gene flag will create a report grouping all variants that are predicted to be pathogenic contained in a single gene. The metrics used to measure gene burden is quite simple: Burdengene = ∑ Pvariant ∗ Nsample where Pvariant is the probability for a variant to be pathogenic and Nsample the number of Sample affected by this variant. Since this score does not account for several other parameters, a number of warnings are also present: - Number of sample warning: If more than half of the variant of each gene are present in more than half of the samples. It means that the number of sample affected is suspiciously high. This can happened in misaligned reads in X and Y homologous regions for example. - Long gene warning: If the gene is long (more than 250,000 bp), more variants are expected. - FLAGS Gene: FLAGS genes are the most frequently mutated genes in Exome sequencing. See https://doi.org/10.1186/s12920-014-0064-y for more details. 3) Output Explained a) Main Output The ouptut files will always be sorted csv/txt/tsv or xlsx files. The variants are sorted from most pathogenic to most benign. Apart from the classical ACMG classification (see original paper for infos, S Richards et al - 2015), TAPES will also provide an estimated probability of pathogenicity calculated based on S.V. Tavtigian et al 2018. To be simple it outputs the probability that this particular variant is pathogenic based on the ACMG criteria. The default of Prior_P = 0.1, exponent X = 2 and OPVST=350 are used. b) By-Sample report 4 This report will contain the 5 most pathogenic variants per sample. Eg. Sample 1 Chr 17 9 2 11 Sample 2 Chr 2 1 17 8 16 Sample 3 Chr 17 6 Sample 4 Chr 1 17 11 1 7 Start 11 119216248 59152382 139324777 54040161 16863238 End Ref 119216248 G 59152382 G 139324777 C 54040161 A 16863238 T Alt A T T C C Func.refGene exonic splicing exonic exonic exonic Gene.refGene MFRP BCAS3 INPP5E ERLEC1 PLEKHA7 ExonicFunc.refGene Probability_Path Prediction_ACMG stopgain 0.9971 Pathogenic . 0.9971 Pathogenic nonsynonymous SNV 0.9941 Pathogenic nonsynonymous SNV 0.9749 Likely Pathogenic nonsynonymous SNV 0.9492 Likely Pathogenic Start End Ref 71351575 71351575 G 197072867 197072867 A 76525627 76525627 G 1874564 1874564 C 333220 333220 G Alt A T A A A Func.refGene exonic exonic exonic exonic exonic Gene.refGene MCEE ASPM DNAH17 ARHGEF10 PDIA2 ExonicFunc.refGene Probability_Path Prediction_ACMG_freesome stopgain 0.9986 Pathogenic stopgain 0.9878 Likely Pathogenic nonsynonymous SNV 0.9492 Likely Pathogenic nonsynonymous SNV 0.8999 Likely Pathogenic synonymous SNV 0.8999 Likely Pathogenic Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene Probability_Path Prediction_ACMG_freesome 7125591 7125591 T C exonic ACADVL nonsynonymous SNV 0.9941 Pathogenic 114379184 114379184 G A exonic HS3ST5 nonsynonymous SNV 0.9749 Likely Pathogenic Start 200549381 7129566 16863238 70881670 128845521 End Ref 200549381 C 7129566 C 16863238 T 70881670 C 128845521 G Alt T T C T A Func.refGene splicing exonic exonic exonic exonic Gene.refGene KIF14 DVL2 PLEKHA7 CTH SMO ExonicFunc.refGene Probability_Path Prediction_ACMG_freesome . 0.9971 Pathogenic nonsynonymous SNV 0.9492 Likely Pathogenic nonsynonymous SNV 0.9492 Likely Pathogenic nonsynonymous SNV 0.8999 Pathogenic nonsynonymous SNV 0.8999 Likely Pathogenic Func.refGene exonic exonic exonic exonic Gene.refGene Probability_Path Prediction_ACMG TTN 0.9749 Likely Pathogenic TTN 0.9492 Likely Pathogenic TTN 0.8121 Likely Pathogenic TTN 0.8121 Likely Benign BE_sample, BK_sample BE_sample, BK_sample BA_sample BE_sample, BK_sample, BY_sample, T_sample Func.refGene exonic exonic exonic intronic Gene.refGene Probability_Path Prediction_ACMG MUTYH 0.9986 Pathogenic MUTYH 0.9878 Pathogenic MUTYH 0.9878 Pathogenic MUTYH 0.8999 Likely Pathogenic BE_sample BR_sample, T_sample BR_sample, T_sample BE_sample, BF_sample . c) By-Gene report TTN Chr 2 2 2 2 MUTYH Chr 1 1 1 1 DNAH17 Chr 17 17 17 7.9087 LONG GENE Start End Ref 179411522 179411522 G 179396978 179396978 G 179590714 179590714 T 179605212 179605212 C 6.7496 Start End Ref 45798627 45798627 C 45797228 45797228 C 45798475 45798475 T 45796257 45796257 C 5.6738 Start End Ref 76486850 76486850 G 76525627 76525627 G 76498689 76498689 T Alt A A A T Alt T T C T Alt A A C Func.refGene Gene.refGene Probability_Path Prediction_ACMG exonic DNAH17 0.9878 Likely Pathogenic exonic DNAH17 0.9492 Likely Pathogenic exonic DNAH17 0.8999 Likely Pathogenic BE_sample, BK_sample AH_sample, BV_sample BE_sample, BK_sample Every table has, above the header, the name of the gene, the gene burden score and, in certain cases, a warning. d) EnrichR report Rank Name 1 intraciliary retrograde transport (GO:0035721) 2 DNA strand elongation involved in DNA replication (GO:0006271) 3 short-chain fatty acid catabolic process (GO:0019626) 4 base-excision repair (GO:0006284) 5 carbohydrate catabolic process (GO:0016052) 6 protein deglycosylation (GO:0006517) 7 myosin filament assembly (GO:0031034) 8 striated muscle myosin thick filament assembly (GO:0071688) 9 lagging strand elongation (GO:0006273) 10 mannose metabolic process (GO:0006013) 11 sarcomere organization (GO:0045214) P-value 1.54409E-07 7.06476E-06 0.000536672 8.18874E-06 3.64323E-05 0.001301574 0.00014685 0.000342639 0.000536672 0.005072831 0.000652493 Z-score Combined score genes -2.680773475 42.04434394 ['ICK', 'DYNC2LI1', 'IFT43', 'TTC21B', 'IFT122', 'TTC21A', 'WDR35'] -2.485462184 29.47855495 ['GINS1', 'RFC4', 'LIG1', 'PARP2', 'LIG4', 'LIG3', 'POLE'] -3.426835601 25.80449662 ['MCEE', 'PCCB', 'MUT', 'PCK2'] -1.893262125 22.17530614 ['WRN', 'LIG1', 'NTHL1', 'OGG1', 'POLL', 'LIG3', 'ERCC6', 'POLE', 'TP53', 'MUTYH'] -2.144171463 21.91354774 ['HK3', 'PKLR', 'MAN2B2', 'NAGA', 'MAN2C1', 'PGK2', 'ENO2', 'PFKM', 'PGM1'] -3.128363521 20.78541288 ['MAN2A2', 'MAN2B2', 'MAN2C1', 'ENGASE'] -2.308310721 20.37337836 ['MYBPC2', 'MYBPHL', 'MYBPH', 'MYOM2', 'TTN'] -2.480528447 19.79172169 ['MYBPC2', 'MYBPHL', 'MYBPH', 'MYOM2', 'TTN'] -2.355809806 17.7395397 ['LIG1', 'PARP2', 'LIG4', 'LIG3'] -3.293923071 17.4046157 ['MAN2A2', 'MAN2B2', 'MAN2C1'] -2.289434136 16.792337 ['MYBPC2', 'MYBPHL', 'MYBPH', 'CAPN3', 'MYOM2', 'MYH6', 'TTN'] The 11 most relevant pathway will be in the EnrichR report. Only pathways with significant adjusted p-values should be considered 5 adjusted p-values 0.000492719 0.008710093 0.159775968 0.008710093 0.029063896 0.188745456 0.078099737 0.156194492 0.159775968 0.370626972 0.159775968 e) Kegg, List and Disease reports Kegg, list and Disease report will look very similar to the main output. Kegg and list will only show variant that belong to either a determined keg pathway (see list in appendix) or a list of userprovided genes. The disease report will only show variant that have a certain term in the Disease column of the annotation. Eg. “Autosomal dominant”, “cancer”, “Colorectal” 3) ANNOVAR INTERFACE Note that TAPES accepts for annotation: vcf files, bcf files, bgzipped vcf files and gzipped bcf files. They will automatically converted to vcf files prior to annotation. Users should also have downloaded ANNOVAR first (free for non-commercial use) : http://www.openbioinformatics.org/annovar/annovar_download_form.php 1) First Use When using TAPES for the first time, you need to indicate the location of your local ANNOVAR folder: python tapes.py db -s -A /path/to/annovar/ The -s stands for --see-db, a tag used to see all databases present on your system. The output should look like this: 2) Simplified database management and annotation: Using the --acmg tag a) DOWNLOADING DATABASES Use db -b --acmg or db --build_db --acmg to start downloading the necessary databases for the ACMG criteria assignment. You can specify the assembly to use (either hg19 or hg38) with the -assembly option (default is hg19) The necessary databases for all possible criteria assignment are: gnomad_genome gnomad_exome or exac03 (gnomad_exome is the default) avsnp150 clinvar_20180603 dbnsfp35c one of the genome annotation : refGene, ensGene, knownGene python tapes.py db -b --acmg --assembly hg19 6 This command will download the databases in the /humandb directory located in the ANNOVAR folder. You can then check that all the databases have been downloaded using: python tapes.py db –s b) ANNOTATING VCF FILE To annotate a VCF file, use the annotate option with --acmg tag to easily annotate your vcf with all the relevant databases for ACMG classification. One again use --assembly to specify the assembly version python tapes.py annotate -i /path/to/file.vcf -o /path/to/output.csv --acmg – assembly hg19 This will produce the annotated file output.csv and if the vcf is multi-sample, the file output_with_samples.csv will also be created. python tapes.py annotate -i /path/to/file.vcf -o /path/to/output.txt --acmg – assembly hg19 This will produce the annotated file output.txt and if the vcf is multi-sample, the file output_with_samples.txt will also be created. python tapes.py annotate -i /path/to/file.vcf -o /path/to/output.vcf --acmg – assembly hg19 This will produce the annotated file output.vcf. 3) Advanced database management and annotation a) DATABASE MANAGMENT TAPES provides two files to easily manage databases and ANNOVAR annotations. db_config.json is an easily readable json file which shows all (most of the) available ANNOVAR databases. Those files are generated after the first use. Missing databases are flagged “MISSING”, downloaded databases are flagged OK. To flag a database for download, replace ”MISSING” by ”DOWNLOAD” or ”DOWN”. Then run: python tapes.py db -b This will download all databases flagged for download. b) ANNOTATION db_vcf.json is an easily readable json file which shows all downloaded databases and which databases are used to annotate vcf_files. Databases flagged ”YES” will be used for annotation and databases flagged ”NO” will be ignored. Flag ”YES” for all databases you want to use for annotation then run: python tapes.py annotate -i /path/to/file.vcf -o /path/to/output.csv This will output two file: a standard annotated output.csv file and an output_with_samples.csv containing sample genotyping data. 7 python tapes.py annotate -i /path/to/file.vcf -o /path/to/output.txt This will output two file: a standard annotated output.txt file and an output_with_samples.txt containing sample genotyping data. python tapes.py annotate -i /path/to/file.vcf -o /path/to/output.vcf This will produce the annotated file output.vcf c) ANNOTATION OPTIONS Option Type Description Default --assembly str Assembly version : either hg19 or hg38 hg19 --ref_anno str Genome annotation : either refGene for RefSeq, ensGene for ENSEMBL qnd knownGene for UCSC refGene 4) DECOMPOSING VCF TAPES will automatically decompose VCFs files before annotation. But TAPES can decompose a VCF file without annotating it using: python tapes.py decompose –i /original.vcf –o /decomposed.vcf 5) RE-ANALYSING TAPES OUTPUTS If you want to generate a report from previously sorted file. You can use the analyse (or analyze) option. For example: python tapes.py analyse -i /path/to/sorted_output.txt -o /path/to/output_report.txt --by_sample Will output a by-sample report python tapes.py analyse -i /path/to/sorted_output.txt -o /path/to/output_report.txt --by_gene Will output a by-gene report Please note that you can only output one report at a time. For example python tapes.py analyse -i /path/to/sorted_output.txt -o /path/to/output_report.txt – --by_gene –by-sample –enrichr –list “MLH1 MSH2 APC” will not work. 8 APPENDIX KEGG Pathways keys 9 2-oxocarboxylic acid metabolism abc transporters acute myeloid leukemia adherens junction adipocytokine signaling pathway adrenergic signaling in cardiomyocytes african trypanosomiasis age-rage signaling pathway in diabetic complications alanine, aspartate and glutamate metabolism alcoholism aldosterone synthesis and secretion aldosterone-regulated sodium reabsorption allograft rejection alpha-linolenic acid metabolism alzheimer disease amino sugar and nucleotide sugar metabolism aminoacyl-trna biosynthesis amoebiasis amphetamine addiction ampk signaling pathway amyotrophic lateral sclerosis antifolate resistance antigen processing and presentation apelin signaling pathway apoptosis apoptosis - multiple species arachidonic acid metabolism arginine and proline metabolism arginine biosynthesis arrhythmogenic right ventricular cardiomyopathy ascorbate and aldarate metabolism asthma autoimmune thyroid disease autophagy - animal autophagy - other axon guidance b cell receptor signaling pathway bacterial invasion of epithelial cells basal cell carcinoma basal transcription factors base excision repair beta-alanine metabolism bile secretion biosynthesis of amino acids biosynthesis of unsaturated fatty acids biotin metabolism bladder cancer breast cancer butanoate metabolism c-type lectin receptor signaling pathway caffeine metabolism calcium signaling pathway camp signaling pathway carbohydrate digestion and absorption carbon metabolism cardiac muscle contraction cell adhesion molecules cell cycle cellular senescence central carbon metabolism in cancer cgmp-pkg signaling pathway chagas disease chemical carcinogenesis chemokine signaling pathway cholesterol metabolism choline metabolism in cancer cholinergic synapse chronic myeloid leukemia circadian entrainment circadian rhythm citrate cycle cocaine addiction collecting duct acid secretion colorectal cancer complement and coagulation cascades cortisol synthesis and secretion cushing syndrome cysteine and methionine metabolism cytokine-cytokine receptor interaction cytosolic dna-sensing pathway d-arginine and d-ornithine metabolism d-glutamine and d-glutamate metabolism dilated cardiomyopathy dna replication dopaminergic synapse drug metabolism cytochrome p450 drug metabolism - other enzymes ecm-receptor interaction egfr tyrosine kinase inhibitor resistance endocrine and other factorregulated calcium reabsorption endocrine resistance endocytosis endometrial cancer epithelial cell signaling in helicobacter pylori infection epstein-barr virus infection erbb signaling pathway estrogen signaling pathway ether lipid metabolism fanconi anemia pathway fat digestion and absorption fatty acid biosynthesis fatty acid degradation fatty acid elongation fatty acid metabolism fc epsilon ri signaling pathway fc gamma r-mediated phagocytosis ferroptosis fluid shear stress and atherosclerosis focal adhesion folate biosynthesis foxo signaling pathway fructose and mannose metabolism gabaergic synapse galactose metabolism gap junction gastric acid secretion gastric cancer glioma glucagon signaling pathway glutamatergic synapse glutathione metabolism glycerolipid metabolism glycerophospholipid metabolism glycine, serine and threonine metabolism glycolysis / gluconeogenesis glycosaminoglycan biosynthesis - chondroitin sulfate / dermatan sulfate glycosaminoglycan biosynthesis - heparan sulfate / heparin glycosaminoglycan biosynthesis - keratan sulfate glycosaminoglycan degradation glycosphingolipid biosynthesis ganglio series glycosphingolipid biosynthesis globo and isoglobo series glycosphingolipid biosynthesis lacto and neolacto series glycosylphosphatidylinositol glyoxylate and dicarboxylate metabolism gnrh signaling pathway graft-versus-host disease hedgehog signaling pathway hematopoietic cell lineage 10 hepatitis b hepatitis c hepatocellular carcinoma herpes simplex infection hif-1 signaling pathway hippo signaling pathway hippo signaling pathway multiple species histidine metabolism homologous recombination human cytomegalovirus infection human immunodeficiency virus 1 infection human papillomavirus infection human t-cell leukemia virus 1 infection huntington disease hypertrophic cardiomyopathy il-17 signaling pathway inflammatory bowel disease inflammatory mediator regulation of trp channels influenza a inositol phosphate metabolism insulin resistance insulin secretion insulin signaling pathway intestinal immune network for iga production jak-stat signaling pathway kaposi sarcoma-associated herpesvirus infection legionellosis leishmaniasis leukocyte transendothelial migration linoleic acid metabolism lipoic acid metabolism long-term depression long-term potentiation longevity regulating pathway longevity regulating pathway multiple species lysine degradation lysosome malaria mannose type o-glycan biosynthesis mapk signaling pathway maturity onset diabetes of the young measles melanogenesis melanoma metabolic pathways metabolism of xenobiotics by cytochrome p450 micrornas in cancer mineral absorption mismatch repair mitophagy - animal morphine addiction mrna surveillance pathway mtor signaling pathway mucin type o-glycan biosynthesis n-glycan biosynthesis natural killer cell mediated cytotoxicity necroptosis neomycin, kanamycin and gentamicin biosynthesis neuroactive ligand-receptor interaction neurotrophin signaling pathway nf-kappa b signaling pathway nicotinate and nicotinamide metabolism nicotine addiction nitrogen metabolism nod-like receptor signaling pathway non-alcoholic fatty liver disease non-homologous end-joining non-small cell lung cancer notch signaling pathway nucleotide excision repair olfactory transduction one carbon pool by folate oocyte meiosis osteoclast differentiation other glycan degradation other types of o-glycan biosynthesis ovarian steroidogenesis oxidative phosphorylation oxytocin signaling pathway p53 signaling pathway pancreatic cancer pancreatic secretion pantothenate and coa biosynthesis parathyroid hormone synthesis, secretion and action parkinson disease pathogenic escherichia coli infection pathways in cancer pentose and glucuronate interconversions pentose phosphate pathway peroxisome pertussis phagosome phenylalanine metabolism phenylalanine, tyrosine and tryptophan biosynthesis phosphatidylinositol signaling system phospholipase d signaling pathway phosphonate and phosphinate metabolism phototransduction pi3k-akt signaling pathway platelet activation platinum drug resistance porphyrin and chlorophyll metabolism ppar signaling pathway primary bile acid biosynthesis primary immunodeficiency prion diseases progesterone-mediated oocyte maturation prolactin signaling pathway propanoate metabolism prostate cancer proteasome protein digestion and absorption protein export protein processing in endoplasmic reticulum proteoglycans in cancer proximal tubule bicarbonate reclamation purine metabolism pyrimidine metabolism pyruvate metabolism rap1 signaling pathway ras signaling pathway regulation of actin cytoskeleton regulation of lipolysis in adipocytes relaxin signaling pathway renal cell carcinoma renin secretion renin-angiotensin system retinol metabolism retrograde endocannabinoid signaling rheumatoid arthritis riboflavin metabolism ribosome ribosome biogenesis in eukaryotes rig-i-like receptor signaling pathway rna degradation rna polymerase rna transport salivary secretion salmonella infection selenocompound metabolism serotonergic synapse shigellosis signaling pathways regulating pluripotency of stem cells small cell lung cancer snare interactions in vesicular transport sphingolipid metabolism sphingolipid signaling pathway spliceosome staphylococcus aureus infection starch and sucrose metabolism steroid biosynthesis steroid hormone biosynthesis sulfur metabolism sulfur relay system synaptic vesicle cycle synthesis and degradation of ketone bodies systemic lupus erythematosus t cell receptor signaling pathway taste transduction taurine and hypotaurine metabolism terpenoid backbone biosynthesis tgf-beta signaling pathway th1 and th2 cell differentiation th17 cell differentiation thermogenesis thiamine metabolism thyroid cancer thyroid hormone signaling pathway thyroid hormone synthesis tight junction tnf signaling pathway toll-like receptor signaling pathway toxoplasmosis transcriptional misregulation in cancer tryptophan metabolism tuberculosis type i diabetes mellitus type ii diabetes mellitus tyrosine metabolism ubiquinone and other terpenoid-quinone biosynthesis ubiquitin mediated proteolysis valine, leucine and isoleucine biosynthesis valine, leucine and isoleucine degradation vascular smooth muscle contraction vasopressin-regulated water reabsorption vegf signaling pathway vibrio cholerae infection viral carcinogenesis viral myocarditis vitamin b6 metabolism vitamin digestion and absorption wnt signaling pathway EnrichR Libraries Genes_Associated_with_NIH_Grants Cancer_Cell_Line_Encyclopedia Achilles_fitness_decrease Achilles_fitness_increase Aging_Perturbations_from_GEO_down Aging_Perturbations_from_GEO_up Allen_Brain_Atlas_down Allen_Brain_Atlas_up ARCHS4_Cell-lines ARCHS4_IDG_Coexp ARCHS4_Kinases_Coexp ARCHS4_TFs_Coexp ARCHS4_Tissues BioCarta_2013 BioCarta_2015 BioCarta_2016 BioPlex_2017 ChEA_2013 ChEA_2015 ChEA_2016 Chromosome_Location Chromosome_Location_hg19 CORUM Data_Acquisition_Method_Most_Popular_G enes dbGaP Disease_Perturbations_from_GEO_down Disease_Perturbations_from_GEO_up Disease_Signatures_from_GEO_down_20 14 Disease_Signatures_from_GEO_up_2014 Drug_Perturbations_from_GEO_2014 Drug_Perturbations_from_GEO_down Drug_Perturbations_from_GEO_up DrugMatrix DSigDB ENCODE_and_ChEA_Consensus_TFs_fro m_ChIP-X ENCODE_Histone_Modifications_2013 ENCODE_Histone_Modifications_2015 ENCODE_TF_ChIP-seq_2014 ENCODE_TF_ChIP-seq_2015 Enrichr_Libraries_Most_Popular_Genes Enrichr_Submissions_TFGene_Coocurrence Epigenomics_Roadmap_HM_ChIP-seq 11 ESCAPE GeneSigDB Genome_Browser_PWMs GO_Biological_Process_2013 GO_Biological_Process_2015 GO_Biological_Process_2017 GO_Biological_Process_2017b GO_Biological_Process_2018 GO_Cellular_Component_2013 GO_Cellular_Component_2015 GO_Cellular_Component_2017 GO_Cellular_Component_2017b GO_Cellular_Component_2018 GO_Molecular_Function_2013 GO_Molecular_Function_2015 GO_Molecular_Function_2017 GO_Molecular_Function_2017b GO_Molecular_Function_2018 GTEx_Tissue_Sample_Gene_Expression LINCS_L1000_Kinase_Perturbations_up LINCS_L1000_Ligand_Perturbations_down LINCS_L1000_Ligand_Perturbations_up MCF7_Perturbations_from_GEO_down MCF7_Perturbations_from_GEO_up MGI_Mammalian_Phenotype_2013 MGI_Mammalian_Phenotype_2017 MGI_Mammalian_Phenotype_Level_3 MGI_Mammalian_Phenotype_Level_4 Microbe_Perturbations_from_GEO_down Microbe_Perturbations_from_GEO_up miRTarBase_2017 Mouse_Gene_Atlas MSigDB_Computational MSigDB_Oncogenic_Signatures NCI-60_Cancer_Cell_Lines NCI-Nature_2015 NCI-Nature_2016 NURSA_Human_Endogenous_Complexom _Profiles_down GTEx_Tissue_Sample_Gene_Expression _Profiles_up HMDB_Metabolites HomoloGene Human_Gene_Atlas Human_Phenotype_Ontology HumanCyc_2015 HumanCyc_2016 huMAP Jensen_COMPARTMENTS Jensen_DISEASES Jensen_TISSUES KEA_2013 KEA_2015 KEGG_2013 KEGG_2015 KEGG_2016 Kinase_Perturbations_from_GEO_down Kinase_Perturbations_from_GEO_up Ligand_Perturbations_from_GEO_down Ligand_Perturbations_from_GEO_up LINCS_L1000_Chem_Pert_down LINCS_L1000_Chem_Pert_up LINCS_L1000_Kinase_Perturbations_do wn e Old_CMAP_down Old_CMAP_up OMIM_Disease OMIM_Expanded Panther_2015 Panther_2016 Pfam_InterPro_Domains Phosphatase_Substrates_from_DEPOD PPI_Hub_Proteins Reactome_2013 Reactome_2015 Reactome_2016 RNASeq_Disease_Gene_and_Drug_Signatures_f rom_GEO SILAC_Phosphoproteomics Single_Gene_Perturbations_from_GEO_do wn Single_Gene_Perturbations_from_GEO_up SysMyo_Muscle_Gene_Sets TargetScan_microRNA TargetScan_microRNA_2017 TF-LOF_Expression_from_GEO TF_Perturbations_Followed_by_Expression Tissue_Protein_Expression_from_Human_ Proteome_Map Tissue_Protein_Expression_from_Proteomi csDB Transcription_Factor_PPIs TRANSFAC_and_JASPAR_PWMs 12 Virus_Perturbations_from_GEO_down Virus_Perturbations_from_GEO_up VirusMINT WikiPathways_2013 WikiPathways_2015 WikiPathways_2016 ACMG Criteria assignment (refer to S Richards et al - 2015 for a description of the criteria) Pathogenic Criteria PVS1 Will be assigned to a variant if it is a stopgain or frameshift deletion/insertion located 50 bp further than the end of the final exon. (Based on the ExonicFunc column anf the REK_canon library) Will be assigned to a splicing variant with a dbscSNV score of more than 0.6 (ADA or RF) (Based on the Func column and the dbscSNV score annotation) PS1 Will be assigned if a variant have the same AA ref and AA alt as a known pathogenic variant. Using all known pathogenic variants from clinvar PS2 Will be assigned if a variant is assumed de novo and parents are disease free. This requires trio data. PS3 Will be assigned if clinvar classifies the variant as Pathogenic or drug reponse and the level of evidence is either ‘practice guideline’ or ‘reviewed by expert panel’ PS4 Will be assigned if a variant is enriched in the samples provided. Requires either ‘output_with_samples.csv’ from the annotation to keep sample genotyping data or an annotated multisample vcf. PS4 will take the affected individuals with the mutations and the total number of individuals in the disease cohort and compare it to the data from gnomad_genome and gnomad_exome. The number of individuals with and without variants in public data is extrapolated with the following formula: Minor allele frequency in control population (MAF) = MAFc = y × 10−x Number of individuals with the variant in control population = nc = ⌈y⌉ 10x Total number of individuals in control population = Nc = − 𝑛𝑐 2 Then a fisher’s exact test is performed to calculate the odd ratios, the confidence interval and the p value. PS4 will only be considered if at least 2 samples are affected by a variant. Otherwise, Intervar PS4 database, based on GWAS database will be used. PS4 will be assigned if the Odd Ratio is superior to 20, the confidence interval does not cross one and the p value is under 0.01 PM1 Will be assigned if the variant is a Missense variant (nonsynonymous SNV) and is located in a in a domain without benign variants (Using Intervar db) for benign domains PM2 Will be assigned if the variant is in a recessive gene and has a frequency under 0.005 or is in a dominant gene and has no frequency data available. Recessive and Dominant/Haploinsufficient genes were infered using Pli and Prec scores computed by Lek et al, 2016. A gene is considered dominant dominant with a pli >0.85 and recessive if prec >0.85 PM4 Will be assigned if the variant is an in-frame deletion/insertion in a non-repeat region of the gene. Using the repeat_dict database. PM5 Will be assigned if a variant have the same AA ref and a different AA alt as a known pathogenic variant. Using all known pathogenic variants from clinvar 13 PP2 Will be assigned if the variant is Missense (nonsynonymous SNV) in a gene where missense variants represents at least 80 percent of all known pathogenic variants (using PP2_BP1 database) PP3 Will be assigned if the variant is predicted to be pathogenic using various in-silico prediction tools (sift, lrt, mutationtaster, mutation assessor, fathmm, provean, meta svm, meta lr, mcap, mkl, genocanyon, gerp) PP5 Will be assigned the variant is classified as pathogenic or likely pathogenic by clinvar but the evidence is limited. Benign criteria BA1 Will be assigned to a variant if its frequency in gnomad_exome/exac or gnomad_genome is superior to 0.05 BS1 Will be assigned to a variant if its frequency is superior to a cutoff (0.005) for a rare disease. BS2 Will be assigned if the variant was observed in a healthy individual as homozygous for a recessive disease and heterozygous for a dominant disease. (Using Intervar db BS2_hom_het) BS3 Will be assigned if clinvar classifies the variant as Benign or likely benign and the level of evidence is either ‘practice guideline’ or ‘reviewed by expert panel’ BP1 Will be assigned if the variant is Missense (nonsynonymous SNV) in a gene where missense variants represents at most 10 percent of all known pathogenic variants (using PP2_BP1 database). BP3 Will be assigned if the variant is an in-frame deletion/insertion in a repeat region of the gene. (Using the repeat_dict database). BP4 Will be assigned if the variant is predicted to be benign using various in-silico prediction tools (sift, lrt, mutationtaster, mutation assessor, fathmm, provean, meta svm, meta lr, mcap, mkl, genocanyon, gerp) BP6 Will be assigned the variant is classified as Benign or likely benign by clinvar but the evidence is limited. BP7 Will be assigned if a variant if synonymous and no splicing impact is predicted by dbscSNV (score under 0.6) 14
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 14 Language : en-AU Tagged PDF : Yes Author : Alexandre Xavier Creator : Microsoft® Word 2013 Create Date : 2019:03:20 10:57:15+11:00 Modify Date : 2019:03:20 10:57:15+11:00 Producer : Microsoft® Word 2013EXIF Metadata provided by EXIF.tools