Specter Spark User Guide
User Manual: Pdf
Open the PDF directly: View PDF .
Page Count: 2
Download | ![]() |
Open PDF In Browser | View PDF |
Specter: linear deconvolution as a new paradigm for targeted analysis of data-independent acquisition mass spectrometry User Guide for Spark Mode Ryan Peckner April 26, 2018 Specter is an algorithm for the targeted analysis of data-independent acquisition mass spectrometry experiments. It can analyze data from any instrument type and window acquisition scheme. The required user inputs are a DIA data file in centroided mzML format, a spectral library in blib format, and a mass accuracy parameter, specified in parts-per-million. The raw output of Specter is a .csv file describing the total ion intensities (= sum of fragment ion intensities) of each precursor in the spectral library at each retention time point of the experiment. A snippet of the typical ’SpecterCoeffs’ raw output file looks like this: Scan index 10032 10032 10033 Retention time (s) 268.3763 268.3763 268.4273 Precursor sequence ETLDASLPSDYLK NPAADAGSNNASKK IVLVDDSIVR Precursor charge 2 2 2 Total ion intensity 1,569,034 3,112,580 722,175 while the ’SpecterQuants’ file contains the identifications and quantifications based on these scan-by-scan total ion intensities: Precursor sequence ETLDASLPSDYLK NPAADAGSNNASKK IVLVDDSIVR Precursor charge 2 2 2 Quant 148,110,338 32,234,856 11,768,772 System requirements Specter requires a computing cluster with Apache Spark v. >= 1.6 (with the PySpark API), Anaconda v. >= 4.2, and Python version >= 2.7.9 (Specter hasn’t been tested with Python 3). At least 100 GB of cluster RAM is recommended. R version >=3.2.3 is also required, with the packages ’kza’,’pracma’ and ’moments’ installed. Setup Download the contents of the GitHub repository to a location accessible to the Spark cluster you’ll be using to run Specter. From the command line of the Spark cluster, cd to this directory. Now create a conda environment called "SpecterEnv" with the commands $ conda env create -f SpecterEnv .yml $ source activate SpecterEnv Due to a bug in the package pymzml, which Specter uses to parse the mzML mass spec data files, you will need to download the most recent MS bioontology file from http://data.bioontology.org/ontologies/MS/submissions/116/download?apikey= 8b5b7825-538d-40e0-9e9e-5ab9274a9aeb, rename it as “psi-ms-4.0.1.obo", and manually move it to the pymzml obo repository located in /usr/anaconda/envs/SpecterEnv/lib/python2.7/site-packages/pymzml/obo (this path may need to be altered depending on where your Anaconda install lives). This issue has been flagged with the pymzml developers but has yet to be resolved. Next, the Spark executor nodes’ Python path must be pointed to SpecterEnv so that Spark can access the various packages needed for Specter: $ sudo export PYSPARK_DRIVER_PYTHON =/ usr/ anaconda /envs/ SpecterEnv /bin/ python $ sudo export PYSPARK_PYTHON =/ usr/ anaconda /envs/ SpecterEnv /bin/ python The path /usr/anaconda/envs/SpecterEnv may vary depending on your settings for Anaconda; the appropriate value can be found by running conda info -e at the command line to locate SpecterEnv. 1 Mass spectrometry data and spectral libraries All mass spec data has to be provided in centroided mzML format. ProteoWizard’s msconvert application is recommended to convert raw mass spec data to mzML and perform centroiding, including for AB SCIEX TripleTOF data in .wiff format. Spectral libraries must be provided in .blib format. Skyline can create a .blib spectral library from most DDA search engine output formats; for a tutorial on how to do this from a pep.xml file, see http://dia-swath-course.ethz.ch/tutorials2014/ Tutorial-3_Library.pdf. The list of supported search engine output formats for conversion to blib by Skyline can be found at https://skyline.ms/wiki/home/software/BiblioSpec/page.view?name=BlibBuild. Warning: When building a spectral library from phosphoproteomics data, make sure to filter your search engine output so that only unambiguously localized phosphosites are included (using a localization score such as the A-score, PTM score, or variable modification location score from Spectrum Mill). This is necessary to avoid conflation of spectra from positional isomers. Running a Specter job Once the SpecterEnv conda environment has been activated, the syntax for running Specter is $ ./ Specter .shwhere the bracketed arguments are as follows: • mem: The amount of memory to be provisioned to the Spark driver node. • mzMLname: The full path to the mzML file containing the DIA data to be analyzed, without the mzML extension. • blibName: The name of the blib file containing the spectral library, without the blib extension. • index: The first or last index of the subset of MS2 spectra to be analyzed (see the next argument). • StartOrEnd: Should index be interpreted as the first (StartOrEnd = "start") or last (StartOrEnd = "end") index of the spectra to be analyzed? This is useful for breaking jobs into smaller pieces to respect cluster memory constraints. • numPartitions: The number of partitions Spark will use to parallelize the MS2 spectra. A reasonable starting choice is five times the number of cluster CPUs. • instrumentType: This can be one of ’orbitrap’,’tof’, or ’other’. Use of this argument in the first two cases helps avoid certain known issues with mzMLs coming from data converted from these instrument types. • tol: The instrument mass accuracy, in parts-per-million. For example, the command $ ./ Specter .sh 15g / rpeckner /data /20170501 _PhosphoDIARun1 / rpeckner /libs/ HumanPhosphoLib 100000 end 200 orbitrap 10 would tell Specter to analyze the first 100,000 MS2 spectra in the Orbitrap DIA experiment file PhosphoDIARun1.mzML located in /rpeckner/data/20170501 using the spectral library HumanPhosphoLib.blib located in /rpeckner/libs/. The remaining parameters are the mass tolerance of 10 p.p.m., 200 partitions of the associated Spark RDD (the elements of this RDD correspond to the individual MS2 spectra), and 15GB RAM available to the driver node. Specter will create a subdirectory ’SpecterResults’ of the directory containing the mzML, into which the output files will be written. These are the following csv files with the common prefix ‘ _ ’: • header: The header of the mzML file, containing the scan indices, retention times, and precursor m/z window centers. • SpecterCoeffs: The raw Specter coefficients for every library precursor and MS/MS scan. • SpecterCoeffsDecoys: The raw Specter coefficients for both library and decoy precursors and all MS/MS scans. • SpecterQuants: The final identifications and quantifications of library precursors filtered to a 1% false discovery rate. You can visualize the calculated chromatograms of individual library precursors in R using ggplot: library ( ggplot2 ) SpecterChromatograms <- read.csv(‘/ rpeckner /data/ SpecterResults / 20170501 _ PhosphoDIARun1 _ SpecterCoeffs .csv ’, header = FALSE ) names ( SpecterChromatograms ) <- c(" Coefficient "," Scan "," PeptideSeq ", " PrecursorCharge ", " WindowMZCenter "," RetentionTime ") ggplot ( subset ( SpecterChromatograms , PeptideSeq == " AEILLTYS [+80.0] K" & PrecursorCharge == 2), aes(x= RetentionTime ,y= Coefficient )) + geom_point () + geom_line () 2
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 2 Page Mode : UseOutlines Author : Title : Subject : Creator : LaTeX with hyperref package Producer : pdfTeX-1.40.17 Create Date : 2018:04:26 17:03:28-04:00 Modify Date : 2018:04:26 17:03:28-04:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) kpathsea version 6.2.2EXIF Metadata provided by EXIF.tools