Sa TAnn Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 27

Download
Open PDF In Browser	View PDF

Package ‘SaTAnn’
April 12, 2019
Title Splice-Aware Translatome Annotation
Version 0.99.0
Description
SaTAnn is a method that quantifies translation at the single ORF level using Ribo-seq data.
Depends rtracklayer, BSgenome, devtools, Biostrings, GenomicFeatures,
foreach, doMC, multitaper, GenomicAlignments, GenomicFiles,
reshape2, ggplot2, cowplot, grid, BiocGenerics, knitr,
gridExtra, rmarkdown
License GPL-3 or above
Encoding UTF-8
LazyData FALSE
Name SaTAnn
biocViews RiboSeq, GenomeAnnotation, Transcriptomics, Software
RoxygenNote 6.1.1
NeedsCompilation no
Author Lorenzo Calviello [aut, cre],
Uwe Ohler [rev, fnd]
Maintainer Lorenzo Calviello

R topics documented:
annotate_ORFs . . . . . . .
annotate_splicing . . . . . .
calc_orf_pval . . . . . . . .
create_SaTAnn_html_report
detect_readthrough . . . . .
detect_translated_orfs . . . .
from_tx_togen . . . . . . . .
get_orfs . . . . . . . . . . .
get_ps_fromsplicemin . . . .
get_ps_fromspliceplus . . .
get_reathr_seq . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
1

.
.
.
.
.
.
.
.
.
.
.

2
4
5
6
7
8
9
10
11
11
12

annotate_ORFs
load_annotation . . . . .
plot_SaTAnn_results . .
prepare_annotation_files
prepare_for_SaTAnn . .
run_SaTAnn . . . . . . .
SaTAnn . . . . . . . . .
select_quantify_ORFs . .
select_start . . . . . . .
select_txs . . . . . . . .
take_Fvals_spect . . . .

.
.
.
.
.
.
.
.
.
.

Index

annotate_ORFs

.
.
.
.
.
.
.
.
.
.

13
13
15
17
18
20
22
24
25
26
27

Annotate detected ORFs in transcript and genome space

Description
This function annotates quantified ORFs with respect to other detected ORFs and annotated ones,
in both genome and transcript space.
Usage
annotate_ORFs(results_ORFs, Annotation, genome_sequence, region,
genetic_code)
Arguments
results_ORFs

Full list of detected ORFs, from select_quantify_ORFs

Annotation
Rannot object containing annotation of CDS and transcript structures (see prepare_annotation_files
genome_sequence
BSgenome object
region

genomic region being analyzed

genetic_code

GENETIC_CODE table to use

Details
As multiple transcripts can contain the same ORF, all the transcript and transcript biotypes are
indicated, with a preference for protein_coding transcripts in the "compatible" columns (to be conservative when assessing translation of non-protein coding transcripts). Such compatibility is also
output considering the most upstream start codon for that ORF.
Splice features of each orf is annotated with respect to the longest coding transcripts and to the
highest translated ORF in that gene.
Variants in N or C terminus of the translated proteins are also indicated (Beta).
ORF annotation with respect to the annotated transcript is also indicated, as follows:
novel: no ORF annotated in the transcript.

annotate_ORFs

ORF_annotated: same exact ORF as annotated.
N_extension: N terminal extension.
N_truncation: N terminal extension.
uORF: upstream ORF.
overl_uORF: upstream overlappin uORF.
NC_extension: N and C termini extension.
dORF: downstream ORF.
overl_dORF: downstream overlapping ORF.
nested_ORF: nested ORF.
C_truncation: C terminal truncation.
C_extension: C terminal extension.
As transcipt-specific annotation can be misleading due to a plethora of different transcripts, it is
important to distinguish ORFs also on the basis of their overlap with know CDS regions. ORF
annotation with respect to the entire set of CDS exon for the analyzed genomic regions is indicated
as follows:
novel: No CDS region is annotated in the entire region.
novel_Upstream: ORF is upstream of annotated CDS regions (does not overlap).
novel_Downstream: ORF is downstream of annotated CDS regions (does not overlap).
novel_Internal: genomic location of the ORF is present between the start of the first, and the end
of the last CDS region (does not overlap).
exact_start_stop: Same start and end locations.
Alt5_start: Different start region, upstream.
Alt3_start: Different start region, downstream.
Alt5_stop: Different end region, upstream.
Alt3_stop: Different end region, downstream.
Another layer of annotation is performed by checking the position of the ORF stop codon with
respect to the last exon-exon junction.
Value
Exon structure of detected ORF including possible missing exons from reference, together with a
spl_type column including the annotation for each exon (e.g. alternative acceptors or donor).
Additional columns are added to the ORFs_tx object:
compatible_with: Set of transcript ids possibly containing the entire ORF structure.
compatible_biotype: Compatible transcript biotype; if a protein coding transcript can contain the
ORF, this is set to protein_coding.
compatible_tx: One selected compatible transcript (preference if protein_coding).
compatible_ORF_id_tr: ORF_id_tr id if selecting the compatible transcript.
compatible_with_longest: Same as compatible_with but using the most upstream start codon.
compatible_ORF_id_tr_longest: Same as compatible_ORF_id_tr but using the most upstream
start codon .
ref_id: transcript_id of the transcript used to annotate splicing (longest) .
ref_id_maxORF: ORF_id_tr of the ORF used to annotated splicing (most translated of the gene).
NC_protein_isoform: Annotation of possible N or C termini variant (when transcript is protein_coding) .

annotate_splicing
ORF_category_Tx: ORF annotation with respect to ORF position in the transcript .
ORF_category_Tx_compatible: ORF annotation with respect to ORF position in the transcript,
using the compatible_ORF_id_tr .
ORF_category_Gen: ORF annotation with respect to its genomic position .
NMD_candidate: TRUE or FALSE, depending on the presence of an additional exon-exon junction
downstream the stop codon.
Distance_to_lastExEx: Distance (in nt) between the last exon-exon junction and the stop codon.

Author(s)
Lorenzo Calviello,
See Also
select_quantify_ORFs, annotate_splicing

annotate_splicing

Annotate splice features of detected ORFs

Description
This function detects usage of different exons and exonic boundaries of one ORF with respect to a
reference ORF.
Usage
annotate_splicing(orf_gen, ref_cds)
Arguments
orf_gen

Exon structure of a detected ORF

ref_cds

Exon structure of a reference ORF

Details
each exon is aligned to the closest one to match acceptor and donor sites, or to annotate missing
exons. 5ss and 3ss indicate exon 5’ and 3’, respectively. CDS_spanning indicates retained intron;
missing_CDS indicates no overlapping exon (missed or included); monoCDS indicates a single-exon
ORF; firstCDS and lastCDS indicate first CDS exon or last CDS exon.
Value
Exon structure of detected ORF including possible missing exons from reference, together with a
spl_type column including the annotation for each exon (e.g. alternative acceptors or donor).
Author(s)
Lorenzo Calviello,

calc_orf_pval

See Also
detect_translated_orfs, annotate_ORFs

calc_orf_pval

Collect ORF Ribo-seq statistics

Description
This function calculates statistics for the analysis of P_sites profiles for each ORF
Usage
calc_orf_pval(ORFs, P_sites_rle, P_sites_uniq_rle, P_sites_uniq_mm_rle,
cutoff = 0.5, tapers = 24, bw = 12)
Arguments
ORFs

Set of detected ORFs

P_sites_rle
Rle signal of P_sites along the transcript
P_sites_uniq_rle
Rle signal of uniquely mapping P_sites along the transcript
P_sites_uniq_mm_rle
Rle signal of uniquely mapping P_sites with mismatches along the transcript
cutoff

cutoff of average in-frame signal for each codon in the ORF. Defaults to .5

tapers

Number of tapers to use in the multitaper analysis. Defaults to 24

time_bw parameter to use in the multitaper analysis. Defaults to 12

Details
Number of P_sites (uniquely mapping or all), frame percentage and multitaper test statistics are collected for each ORF. The parameter space for the multitaper analysis was explored in the RiboTaper
paper.
Value
Set of detected ORFs, including info about the possible longest ORF for that frame.
Author(s)
Lorenzo Calviello,
See Also
detect_translated_orfs, get_orfs, take_Fvals_spect

create_SaTAnn_html_report

create_SaTAnn_html_report
Create an html report summarizing SaTAnn results

Description
This function creates an html report showing summary statistics for SaTAnn-detected ORFs.
Usage
create_SaTAnn_html_report(input_files, input_sample_names, output_file)
Arguments
Character vector with full paths to plot files (*SaTAnn_plots_RData) generated
with plot_SaTAnn_results. Must be of same length as input_sample_names.
input_sample_names
Character vector containing input names. Must be of same length as input_files.
input_files

output_file

String; full path to html report file.

Details
This function creates the html report visualizing final SaTAnn results.
Input are two lists of the same length:
a) input_files: list of full paths to one or multiple input files (*SaTAnn_plots_RData files generated with plot_SaTAnn_results) and
b) input_sample_names: list of corresponding names describing the file content (these are used as
names in the report).
For the report, a RMarkdown file is rendered as html document, saved as output_file.

Value
The function saves the html report file with the file path output_file.
Author(s)
Lorenzo Calviello,
See Also
plot_SaTAnn_results, run_SaTAnn

detect_readthrough

Analyzed translation on possible readthrough regions (beta)

Description
This function uses the multitaper method to look for readthrough translation
Usage
detect_readthrough(results_orf, P_sites, P_sites_uniq, P_sites_uniq_mm,
genome_sequence, annotation, genetic_code_table, cutoff_fr_ave = 0.5)
Arguments
results_orf

Full list of detected ORFs, from select_quantify_ORFs and annotate_ORFs

P_sites

GRanges object with P_sites positions

P_sites_uniq
GRanges object with uniquely mapping P_sites positions
P_sites_uniq_mm
Rle signal of uniquely mapping P_sites with mismatches along the transcript
genome_sequence
BSgenome object
annotation
Rannot object containing annotation of CDS and transcript structures (see prepare_annotation_files)
genetic_code_table
GENETIC_CODE table to use
cutoff_fr_ave

cutoff parameter for the calc_orf_pval functions

Details
The function looks for stop-stop pairs after the stop codon of the detected ORF
Value
GRanges object with the set of translated readthrough regions
Author(s)
Lorenzo Calviello,
See Also
detect_translated_orfs, select_quantify_ORFs, annotate_ORFs, get_reathr_seq

detect_translated_orfs

detect_translated_orfs
Detect actively translated ORFs

Description
This function detects translated ORFs
Usage
detect_translated_orfs(selected_txs, genome_sequence, annotation, P_sites,
P_sites_uniq, P_sites_uniq_mm, genomic_region, genetic_code,
all_starts = T, nostarts = F, start_sel_cutoff = NA,
start_sel_cutoff_ave = 0.5, cutoff_fr_ave = 0.5)
Arguments
selected_txs
set of selected transcripts, output from select_txs
genome_sequence
BSgenome object
annotation

Rannot object containing annotation of CDS and transcript structures (see prepare_annotation_files)

P_sites

GRanges object with P_sites positions

P_sites_uniq

GRanges object with uniquely mapping P_sites positions

P_sites_uniq_mm
GRanges object with uniquely mapping (with mismatches) P_sites positions
genomic_region GRanges object with genomic coordinates of the genomic region analyzed
genetic_code

GENETIC_CODE table to use

all_starts

get_all_starts parameter for the get_orfs function

nostarts

Stop_Stop parameter for the get_orfs function

start_sel_cutoff
cutoff parameter for the select_start function
start_sel_cutoff_ave
cutoff_ave parameter for the select_start function
cutoff_fr_ave

cutoff parameter for the calc_orf_pval functions

Details
A set of transcripts, together with genome sequence and Ribo-signal are analyzed to extract translated ORFs

from_tx_togen

Value
A list with transcript coordinates, exonic coordinates and statistics for each ORF exonic bin and
junction(from select_txs).
The value for each column is as follows:
ave_pct_fr: average percentage of in-frame reads for each codon in the ORF pct_fr: percentage of in-frame reads in the ORF ave_pct_fr: average percentage of in-frame reads for each
codon in the ORF ave_pct_fr_st: average percentage of in-frame reads per each codon between
the selected start codon and the next candidate one pct_fr_st: percentage of in-frame reads between the selected start codon and the next candidate one longest_ORF: GRanges coordinates
for the longest ORF with the same stop codon pval: P-value for the multitaper F-test at 1/3 using the ORF P_sites profile pval_uniq: P-value for the multitaper F-test at 1/3 using the ORF
P_sites profile (only uniquely mapping reads) P_sites_raw: Raw number of P_sites mapping to
the ORF pct_uniq: Percentage of raw number of P_sites mapping to the ORF TrP_raw: Raw
multitaper spectral coefficient at 1/3 using the P_sites ORF signal ORF_id_tr: ORF id containing
__ Protein: AAString sequence of the translated protein region: Genomic
coordinates of the analyzed region gene_id: gene_id for the corresponding analyzed transcript
gene_biotype: gene biotype for the corresponding analyzed transcript gene_name: gene name
for the corresponding analyzed transcript transcript_id: transcript_id for the corresponding analyzed ORF transcript_biotype: transcript biotype for the corresponding analyzed ORF
Author(s)
Lorenzo Calviello,
See Also
select_txs, get_orfs, take_Fvals_spect, select_start, prepare_annotation_files

from_tx_togen

Map transcript coordinates to genomic coordinates

Description
This function uses the mapFromTranscripts function to switch between transcript and genomic
coordinates
Usage
from_tx_togen(ORFs, exons, introns)
Arguments
ORFs

Set of detected ORFs from the calc_orf_pval function

exons

exonic regions of the analyzed transcripts, as a GRangesList object

introns

intronic regions of the analyzed transcripts, as a GRangesList object

get_orfs

Value
exonic coordinates for each ORF.
Author(s)
Lorenzo Calviello,
See Also
mapFromTranscripts

get_orfs

Find ATG-starting ORFs in a sequence

Description
This function loads the annotation created by the prepare_annotation_files function
Usage
get_orfs(tx_name, sequence, get_all_starts = T, Stop_Stop = F,
scores = c(1, 0.5), genetic_code_table)
Arguments
tx_name

transcript_id

sequence

DNAString object containing the sequence of the transcript

get_all_starts Output all possible start codons? Defaults to TRUE
Stop_Stop

Find Stop-Stop pairs (no defined start codon)? Defaults to FALSE

scores
Deprecated
genetic_code_table
GENETIC_CODE table to use
Value
GRanges object containing coordinates for the detected ORFs
Author(s)
Lorenzo Calviello,
See Also
detect_translated_orfs

get_ps_fromsplicemin

Offset spliced reads on minus strand

Description
This function calculates P-sites positions for spliced reads on the minus strand
Usage
get_ps_fromsplicemin(x, cutoff)
Arguments
x

a GAlignments object with a cigar string

cutoff

number representing the offset value

Value
a GRanges object with offset reads
Author(s)
Lorenzo Calviello,
See Also
prepare_for_SaTAnn

get_ps_fromspliceplus Offset spliced reads on plus strand

Description
This function calculates P-sites positions for spliced reads on the plus strand
Usage
get_ps_fromspliceplus(x, cutoff)
Arguments
x

a GAlignments object with a cigar string

cutoff

number representing the offset value

get_reathr_seq

Value
a GRanges object with offset reads
Author(s)
Lorenzo Calviello,
See Also
prepare_for_SaTAnn

get_reathr_seq

Extract possible readthrough sequences (beta)

Description
This function extracts readthrough regions for subsequent analysis
Usage
get_reathr_seq(tx_name, orf, sequence, genetic_code)
Arguments
tx_name

transcript_id

orf

transcript-level ORF coordinates

sequence

DNAString object containing the sequence of the transcript

genetic_code

GENETIC_CODE table to use

Details
The function looks for stop-stop pairs after the stop codon of the detected ORF
Value
GRanges object with the set of possible readthrough sequences
Author(s)
Lorenzo Calviello,
See Also
detect_translated_orfs, select_quantify_ORFs

load_annotation

Load genomic features and genome sequence

load_annotation

Description
This function loads the annotation created by the prepare_annotation_files function
Usage
load_annotation(path)
Arguments
path

Full path to the *Rannot R file in the annotation directory used in the prepare_annotation_files funct

Value
introduces a GTF_annotation object and a genome_seq object in the parent environment
Author(s)
Lorenzo Calviello,
See Also
prepare_annotation_files

plot_SaTAnn_results

Plot general statistics about SaTAnn results

Description
This function produces a series of plots and statistics about the set ORFs called by SaTAnn compared to the annotation. IMPORTANT: Use only on transcriptome-wide SaTAnn results. See
run_SaTAnn
Usage
plot_SaTAnn_results(for_SaTAnn_file, SaTAnn_output_file, annotation_file,
coverage_file_plus = NA, coverage_file_minus = NA,
output_plots_path = NA, prefix = NA)

plot_SaTAnn_results

Arguments
for_SaTAnn_file
path to the "for_SaTAnn" file containing P_sites positions and junction reads

SaTAnn_output_file
Full path to the "_final_SaTAnn_results" RData object output by SaTAnn. See
run_SaTAnn
annotation_file
Full path to the *Rannot R file in the annotation directory used in the prepare_annotation_files funct
coverage_file_plus
Full path to a Ribo-seq coverage (no P-sites but read coverage) bigwig file (plus
strand), as the ones created by RiboseQC
coverage_file_minus
Full path to a Ribo-seq coverage (no P-sites but read coverage) bigwig file (minus strand), as the ones created by RiboseQC
output_plots_path
Full path to the directory where plots in .pdf format are stored.
prefix

prefix appended to output filenames

Value
the function exports a RData object (*SaTAnn_plots_RData) containing data to produce all plots,
and produces different QC plots in .pdf format. The plots created are as follows:
ORFs_found: Number of ORF categories detected per gene biotype.
ORFs_found_pct_tr: Distribution of ORF_pct_P_sites ( ORFs_found_P_sites_pNpM: Distribution of ORF_P_sites_pNpM (P-sites per nucleotide per Million, similar to TPM) for different ORF
categories and gene biotypes.
ORFs_found_len: Distribution of ORF length for different ORF categories and gene biotypes.
ORFs_genes: Number of detected ORFs per gene.
ORFs_genes_tpm: Gene level TPM values, plotted by number of ORFs detected.
ORFs_maxiso: Number of genes plotted against the percentages of gene translation of their most
translated ORF.
ORFs_maxiso_tpm: Gene level TPM values, plotted against the percentages of gene translation of
their most translated ORF.
Sel_txs_genes: Number of genes plotted against the number of selected transcripts.
Sel_txs_genes_tpm: Gene level TPM values, plotted against the number of selected transcripts.
Sel_txs_genes_pct: Percentages of annotated trascripts per gene, plotted against the number of
selected transcripts.
Sel_txs_bins_juns: Percentages of covered exonic bins or junctions, using all annotated transcripts, coding transcripts only, or the set of selected transcripts.
Meta_splicing_coverage: Aggregate signal of Ribo-seq coverage and normalized ORF coverage
across different splice sites combinations, with different mixtures of translated overlapping ORFs.
Author(s)
Lorenzo Calviello,

prepare_annotation_files

See Also
run_SaTAnn

prepare_annotation_files
Prepare comprehensive sets of annotated genomic features

Description
This function processes a gtf file and a twobit file (created using faToTwoBit from ucsc tools:
http://hgdownload.soe.ucsc.edu/admin/exe/ ) to create a comprehensive set of genomic regions of
interest in genomic and transcriptomic space (e.g. introns, UTRs, start/stop codons). In addition,
by linking genome sequence and annotation, it extracts additional info, such as gene and transcript
biotypes, genetic codes for different organelles, or chromosomes and transcripts lengths.
Usage
prepare_annotation_files(annotation_directory, twobit_file, gtf_file,
scientific_name = "Homo.sapiens", annotation_name = "genc25",
export_bed_tables_TxDb = T, forge_BSgenome = T, create_TxDb = T)
Arguments
annotation_directory
The target directory which will contain the output files
twobit_file

Full path to the genome file in twobit format

gtf_file
Full path to the annotation file in GTF format
scientific_name
A name to give to the organism studied; must be two words separated by a ".",
defaults to Homo.sapiens
annotation_name
A name to give to annotation used; defaults to genc25
export_bed_tables_TxDb
Export coordinates and info about different genomic regions in the annotation_directory?
It defaults to TRUE
forge_BSgenome Forge and install a BSgenome package? It defaults to TRUE
create_TxDb

Create a TxDb object and a *Rannot object? It defaults to TRUE

Details
This function uses the makeTxDbFromGFF function to create a TxDb object and extract genomic
regions and other info to a *Rannot R file; the mapToTranscripts and mapFromTranscripts functions are used to map features to genomic or transcript-level coordinates. GTF file mist contain
"exon" and "CDS" lines, where each line contains "transcript_id" and "gene_id" values. Additional
values such as "gene_biotype" or "gene_name" are also extracted. Regarding sequences, the twobit

prepare_annotation_files
file, together with input scientific and annotation names, is used to forge and install a BSgenome
package using the forgeBSgenomeDataPkg function.
The resulting GTF_annotation object (obtained after runnning load_annotation) contains:
txs: annotated transcript boundaries.
txs_gene: GRangesList including transcript grouped by gene.
seqinfo: indicating chromosomes and chromosome lengths.
start_stop_codons: the set of annotated start and stop codon, with respective transcript and
gene_ids. reprentative_mostcommon,reprentative_boundaries and reprentative_5len represent the
most common start/stop codon, the most upstream/downstream start/stop codons and the start/stop
codons residing on transcripts with the longest 5’UTRs
cds_txs: GRangesList including CDS grouped by transcript.
introns_txs: GRangesList including introns grouped by transcript.
cds_genes: GRangesList including CDS grouped by gene.
exons_txs: GRangesList including exons grouped by transcript.
exons_bins: the list of exonic bins with associated transcripts and genes.
junctions: the list of annotated splice junctions, with associated transcripts and genes.
genes: annotated genes coordinates.
threeutrs: collapsed set of 3’UTR regions, with correspinding gene_ids. This set does not overlap
CDS region.
fiveutrs: collapsed set of 5’UTR regions, with correspinding gene_ids. This set does not overlap
CDS region.
ncIsof: collapsed set of exonic regions of protein_coding genes, with correspinding gene_ids. This
set does not overlap CDS region.
ncRNAs: collapsed set of exonic regions of non_coding genes, with correspinding gene_ids. This
set does not overlap CDS region.
introns: collapsed set of intronic regions, with correspinding gene_ids. This set does not overlap
exonic region.
intergenicRegions: set of intergenic regions, defined as regions with no annotated genes on either strand.
trann: DataFrame object including (when available) the mapping between gene_id, gene_name,
gene_biotypes, transcript_id and transcript_biotypes.
cds_txs_coords: transcript-level coordinates of ORF boundaries, for each annotated coding transcript. Additional columns are the same as as for the start_stop_codons object.
genetic_codes: an object containing the list of genetic code ids used for each chromosome/organelle.
see GENETIC_CODE_TABLE for more info.
genome_package: the name of the forged BSgenome package. Loaded with load_annotation
function.
stop_in_gtf: stop codon, as defined in the annotation.

Value
a TxDb file and a *Rannot files are created in the specified annotation_directory. In addition, a
BSgenome object is forged, installed, and linked to the *Rannot object

prepare_for_SaTAnn

Author(s)
Lorenzo Calviello,
See Also
load_annotation, forgeBSgenomeDataPkg, makeTxDbFromGFF, run_SaTAnn.

prepare_for_SaTAnn

Prepare the "for_SaTAnn" file

Description
Prepare the "for_SaTAnn" file
Usage
prepare_for_SaTAnn(annotation_file, bam_file,
path_to_rl_cutoff_file = NA, chunk_size = 5e+06,
path_to_P_sites_plus_bw = NA, path_to_P_sites_minus_bw = NA,
path_to_P_sites_uniq_plus_bw = NA,
path_to_P_sites_uniq_minus_bw = NA,
path_to_P_sites_uniq_mm_plus_bw = NA,
path_to_P_sites_uniq_mm_minus_bw = NA, dest_name = NA)
Arguments
annotation_file
Full path to the annotation file (*Rannot)
bam_file
Full path to the bam file
path_to_rl_cutoff_file
path to the rl_cutoff_file file specifying in 3 columns the read lengths, cutoffs
and compartments ("nucl" for standard chromosomes)
the number of alignments to read at each iteration, defaults to 5000000, increase
when more RAM is available
path_to_P_sites_plus_bw
path to a bigwig file containing P_sites positions on the plus strand
path_to_P_sites_minus_bw
path to a bigwig file containing P_sites positions on the minus strand
path_to_P_sites_uniq_plus_bw
(Optional) path to a bigwig file containing uniquely mapping P_sites positions
on the plus strand
path_to_P_sites_uniq_minus_bw
(Optional) path to a bigwig file containing uniquely mapping P_sites positions
on the minus strand
chunk_size

run_SaTAnn
path_to_P_sites_uniq_mm_plus_bw
(Optional) path to a bigwig file containing uniquely mapping (with mismatches)
P_sites positions on the plus strand
path_to_P_sites_uniq_mm_minus_bw
(Optional) path to a bigwig file containing uniquely mapping (with mismatches)
P_sites positions on the minus strand
dest_name

prefix to use for the output files. Defaults to same as bam_file (appends "for_SaTAnn"
to its filename)

Details
This function uses a list of pre-determined read lengths, cutoffs and compartments to calculate
P_sites positions.
Alternatively, bigwig files containing P_sites position for each strand can be specified. Optional
bigwig files for uniquely mapping P_sites position (with and without mismatches) can be specified
to obtain more statistics on the SaTAnn-identified ORFs
Author(s)
Lorenzo Calviello,
See Also
run_SaTAnn

run_SaTAnn

Run the SaTAnn pipeline

Description
This wrapper function runs the entire SaTAnn pipeline
Usage
run_SaTAnn(for_SaTAnn_file, annotation_file, n_cores,
prefix = for_SaTAnn_file, gene_name = NA, gene_id = NA,
genomic_region = NA, write_temp_files = T, write_GTF_file = T,
write_protein_fasta = T, interactive = T,
stn.orf_find.all_starts = T, stn.orf_find.nostarts = F,
stn.orf_find.start_sel_cutoff = NA,
stn.orf_find.start_sel_cutoff_ave = 0.5,
stn.orf_find.cutoff_fr_ave = 0.5, stn.orf_quant.cutoff_cums = NA,
stn.orf_quant.cutoff_pct = 2, stn.orf_quant.cutoff_P_sites = NA)

run_SaTAnn

Arguments
for_SaTAnn_file
REQUIRED - path to the "for_SaTAnn" file containing P_sites positions and
junction reads
annotation_file
REQUIRED - path to the *Rannot R file in the annotation directory used in the
prepare_annotation_files function
n_cores

REQUIRED - number of cores to use

prefix

prefix to use for the output files. Defaults to same as for_SaTAnn_file (appends to its filename)

gene_name

character vector of gene names to analyze.

gene_id

character vector of gene ids to analyze

genomic_region GRanges object with genomic regions to analyze
write_temp_files
write temporary files. Defaults to TRUE
write_GTF_file write a GTF files with the ORF coordinates. Defaults to TRUE
write_protein_fasta
write a protein fasta file. Defaults to TRUE
interactive

should put R object in global environment? Defaults to TRUE

stn.orf_find.all_starts
orf_find.all_starts parameter for the SaTAnn function
stn.orf_find.nostarts
orf_find.nostarts parameter for the SaTAnn function
stn.orf_find.start_sel_cutoff
orf_find.start_sel_cutoff parameter for the SaTAnn function
stn.orf_find.start_sel_cutoff_ave
orf_find.start_sel_cutoff_ave parameter for the SaTAnn functio
stn.orf_find.cutoff_fr_ave
orf_find.cutoff_fr_ave parameter for the SaTAnn function
stn.orf_quant.cutoff_cums
orf_quant.cutoff_cums parameter for the SaTAnn function
stn.orf_quant.cutoff_pct
orf_quant.cutoff_pct parameter for the SaTAnn function
stn.orf_quant.cutoff_P_sites
orf_quant.cutoff_P_sites parameter for the SaTAnn function
Details
A set of transcripts, together with genome sequence and Ribo-signal are analyzed to extract translated ORFs

SaTAnn

Value
A set of output files containing transcript coordinates, exonic coordinates and annotation for each
ORF, including optional GTF and protein fasta files.
The description for each list object is as follows:
tmp_SaTAnn_results: (Optional) RData object file containing the entire set of results for each
genomic region.
final_SaTAnn_results: RData object file containing the final SaTAnn results, see SaTAnn.
Protein_sequences.fasta: (Optional) Fasta file containing the set of translated proteins .
Detected_ORFs.gtf: GTF file containing coordinates of the detected ORFs.
In addition, new columns are added in the ORFs_tx file:
TrP_pM: (Beta) multitaper spectral coefficient of the P_sites track for each ORF, summing up to
a million.
TrP_pN: (Beta) multitaper spectral coefficient of the P_sites track for each ORF, divided by ORF
length.
TrP_pNpM: (Beta) multitaper spectral coefficient of the P_sites track for each ORF, divided by ORF
length and summing up to a million (akin to TPM).
P_sites_pM: number of P_sites for each ORF, summing up to a million.
P_sites_pN: number of P_sites for each ORF, divided by ORF length.
P_sites_pNpM: number of P_sites for each ORF, divided by ORF length and summing up to a million (akin to TPM).

Author(s)
Lorenzo Calviello,
See Also
prepare_annotation_files, load_annotation, SaTAnn

SaTAnn

Detection, quantification and annotation of translated ORFs in a genomic region

Description
This function detects, quantifies and annotates actively translated ORF in a genomic region
Usage
SaTAnn(region, for_SaTAnn, genetic_code_region, orf_find.all_starts = T,
orf_find.nostarts = F, orf_find.start_sel_cutoff = NA,
orf_find.start_sel_cutoff_ave = 0.5, orf_find.cutoff_fr_ave = 0.5,

SaTAnn

orf_quant.cutoff_cums = NA, orf_quant.cutoff_pct = 2,
orf_quant.cutoff_P_sites = NA)
Arguments
region

GRanges object with genomic coordinates of the genomic region analyzed

for_SaTAnn
"for_SaTAnn" Robject containing P_sites positions and junction reads
genetic_code_region
GENETIC_CODE table to use
orf_find.all_starts
get_all_starts parameter for the detect_translated_orfs function
orf_find.nostarts
Stop_Stop parameter for the detect_translated_orfs function
orf_find.start_sel_cutoff
cutoff parameter for the detect_translated_orfs function
orf_find.start_sel_cutoff_ave
cutoff_ave parameter for the detect_translated_orfs function
orf_find.cutoff_fr_ave
cutoff parameter for the detect_translated_orfs function
orf_quant.cutoff_cums
cutoff_cums parameter for the select_quantify_ORFs function
orf_quant.cutoff_pct
cutoff_pct parameter for the select_quantify_ORFs function
orf_quant.cutoff_P_sites
cutoff_P_sites parameter for the select_quantify_ORFs function
Details
A set of transcripts, together with genome sequence and Ribo-signal are analyzed to extract translated ORFs
Value
A list containing transcript coordinates, exonic coordinates and annotation for each ORF.
The description for each list object is as follows:
ORFs_tx: transcript coordinates of the detected ORFs.
ORFs_gen: genomic (exon) coordinates of the detected ORFs.
ORFs_feat: list of ORF features together with mapping reads and uniqueness.
ORFs_txs_feats: list of transcript features present in the genomic region, together with mapping
reads and uniqueness.
ORFs_spl_feat_longest: splicing annotation for each ORF exon, with respect to the longest annotated coding transcript for each gene.
ORFs_spl_feat_maxORF: splicing annotation for each ORF exon, with respect to the most translated ORF in each gene.
selected_txs: character vector containing the transcript ids of the selected transcripts.

select_quantify_ORFs
ORFs_readthroughs: (Beta) transcript coordinates of the detected ORFs readthroughs.

Author(s)
Lorenzo Calviello,
See Also
select_txs, detect_translated_orfs, select_quantify_ORFs, annotate_ORFs, detect_readthrough

select_quantify_ORFs

Select and quantify ORF translation

Description
This function selects a subset of detected ORFs and quantifies their translation
Usage
select_quantify_ORFs(results_ORFs, P_sites, P_sites_uniq,
cutoff_cums = NA, cutoff_pct = 2, cutoff_P_sites = NA,
optimiz = FALSE, scaling = TRUE)
Arguments
results_ORFs

Full list of detected ORFs, from detect_translated_ORFs

P_sites

GRanges object with P_sites positions

P_sites_uniq

GRanges object with uniquely mapping P_sites positions

cutoff_cums

cutoff to select ORFs until percentage of total gene translation. Defaults to
99

cutoff_pct

minimum percentage of total gene translation for an ORF to be selected. Defaults to 1

cutoff_P_sites minimum number of P_sites assigned to the ORF to be selected. Defaults to 10
optimiz

(Beta) should numerical optimization (minimizing distance between observed
coverage and expected coverage) be used to quantify ORF translation? Defaults
to FALSE

scaling

Additional scaling value taking into account total signal on the detected ORFs
to adjust quantification estimates (recommended). Defaults to TRUE

select_quantify_ORFs

Details
ORFs are first selected using the same method as in the select_txs function, but using ORF features (ORF structures are treated as transcript structures).
Ribo-seq coverage (reads/length) on bins and junctions (set to a length of 60) is used to derive a
scaling factor (0-1) for each ORF, which indicates how much of the ORF coverage can be assigned
to such ORF (1 when no other ORF is present). When no unique features are present on an ORF,
an adjusted scaling value is calculated subtracting coverage expected from a ORF with a unique
feature. When no unique features are present on any ORF, scaling values are calculated assuming
uniform coverage on each ORF.
ORFs are then further filtered to exclude lowly translated ORFs and quantification/selection is reiterated until no ORF is further filtered out. Percentage of total gene translation and length-adjusted
quantification estimates are produced. More details about the quantificatin procedure can be found
in the SaTAnn manuscript.
Additional columns are added to the ORFs_tx object:
TrP: TrP_raw values (spectral coefficient) from detect_translated_ORFs divided by the ORF
scaling value.
ORF_pct_TrP: Percentage of gene translation output for the ORF, derived using TrP values.
ORF_pct_TrP_pN: Percentage of gene translation ouptut (adjusted by length) for the ORF, derived
using TrP values.
P_sites: P_sites_raw value from detect_translated_ORFs divided by the ORF scaling value.
ORF_pct_P_sites: Percentage of gene translation output for the ORF, derived using P_sites values.
ORF_pct_P_sites_pN: Percentage of gene translation ouptut (adjusted by length) for the ORF, derived using P_sites values.
unique_features_reads: initial number of reads on each unique ORF feature. NA when no unique
feature is present.
adj_unique_features_reads: final number of reads on each unique ORF feature after the ORF
filtering/quantification procedure. NA when no unique feature is present.
scaling_factors: Set of 3 scaling factors assigned to the ORF using intial unique ORF features,
after adjusting for the presence of ORFs with no unique features, and final scaling factor after
correcting for total Ribo-seq coverage on the gene.

Value
modified results_ORFs object with the selected ORFs including quantification estimates.

Author(s)
Lorenzo Calviello,

See Also
detect_translated_orfs, select_txs

select_start

Select start codon

Description
This function selects the start codon for ORFs in the same transcript
Usage
select_start(ORFs, P_sites_rle, cutoff = NA, cutoff_ave = 0.5)
Arguments
ORFs

Set of detected ORFs

P_sites_rle

Rle signal of P_sites along the transcript

cutoff

cutoff of total in-frame signal between start codons (sensitive to outliers). Defaults to NA

cutoff_ave

cutoff for frequency of in-frame codons between two start codons (less sensitive
to outliers). Defaults to .5

Details
ORFs are divided based on stop codon and Ribo-seq signal between start codons is used to select
one.
When more than cutoff_ave fraction of codons is in-frame between two candidate start codons,
the most upstream is selected.
Value
Set of detected ORFs, including info about the possible longest ORF for that frame.
Author(s)
Lorenzo Calviello,
See Also
detect_translated_orfs, get_orfs

select_txs

Select a subset of transcripts with Ribo-seq data

Description
This function flattens all annotated transcript structures and uses Ribo-seq to select a subset of
transcripts.
Usage
select_txs(region, annotation, P_sites, P_sites_uniq, junction_counts)
Arguments
region

genomic region being analyzed

annotation

Rannot object containing annotation of CDS and transcript structures (see prepare_annotation_files)

P_sites

GRanges object with P_sites positions

P_sites_uniq

GRanges object with uniquely mapping P_sites positions

junction_counts
GRanges object containing Ribo-seq counts on the set of annotated junctions
Details
Features (bins and junctions) are divided into shared and unique features, and into with support and
without support (with or without reads mapping). A set of logical rules filters out transcripts with
internal features with no support and no unique features with reads. More specific details can be
found in the SaTAnn manuscript.
Value
GRanges object with the set of counts on each exonic bin and junctions, together with the list of
selected transcripts
Author(s)
Lorenzo Calviello,
See Also
prepare_annotation_files

take_Fvals_spect

Extract output from multitaper analysis of a signal

Description
This function uses the multitaper tool to extract F-values and multitaper spectral coefficients
Usage
take_Fvals_spect(x, n_tapers, time_bw, slepians_values)
Arguments
x

numeric signal to analyze

n_tapers

n of tapers to use

time_bw
time_bw parameter
slepians_values
set of calculated slepian functions to use in the multitaper analysis
Details
Values reported correspond to the closest frequency to 1/3 (same parameters as in RiboTaper).
Padding to a minimum length of 1024 is performed to increase spectral resolution.
Value
two numeric values representing the F-value for the multitaper test and its corresponding spectral
coefficient at the closest frequency to 1/3
Author(s)
Lorenzo Calviello,
See Also
detect_translated_orfs, spec.mtm, dpss

Index
∗Topic Ribo-seQC,
get_ps_fromsplicemin, 11
get_ps_fromspliceplus, 11
prepare_annotation_files, 15
∗Topic Ribo-seQC
load_annotation, 13
plot_SaTAnn_results, 13
∗Topic SaTAnn,
load_annotation, 13
plot_SaTAnn_results, 13
∗Topic SaTAnn
annotate_ORFs, 2
annotate_splicing, 4
calc_orf_pval, 5
create_SaTAnn_html_report, 6
detect_readthrough, 7
detect_translated_orfs, 8
from_tx_togen, 9
get_orfs, 10
get_ps_fromsplicemin, 11
get_ps_fromspliceplus, 11
get_reathr_seq, 12
prepare_annotation_files, 15
prepare_for_SaTAnn, 17
run_SaTAnn, 18
SaTAnn, 20
select_quantify_ORFs, 22
select_start, 24
select_txs, 25
take_Fvals_spect, 26

dpss, 26
forgeBSgenomeDataPkg, 17
from_tx_togen, 9
get_orfs, 5, 9, 10, 24
get_ps_fromsplicemin, 11
get_ps_fromspliceplus, 11
get_reathr_seq, 7, 12
load_annotation, 13, 17, 20
makeTxDbFromGFF, 17
mapFromTranscripts, 10
plot_SaTAnn_results, 6, 13
prepare_annotation_files, 9, 13, 15, 20,
25
prepare_for_SaTAnn, 11, 12, 17
run_SaTAnn, 6, 15, 17, 18, 18
SaTAnn, 20, 20
select_quantify_ORFs, 4, 7, 12, 22, 22
select_start, 9, 24
select_txs, 9, 22, 23, 25
spec.mtm, 26
take_Fvals_spect, 5, 9, 26

annotate_ORFs, 2, 5, 7, 22
annotate_splicing, 4, 4
calc_orf_pval, 5
create_SaTAnn_html_report, 6
detect_readthrough, 7, 22
detect_translated_orfs, 5, 7, 8, 10, 12,
22–24, 26
27

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 27
Page Mode                       : UseOutlines
Author                          : 
Title                           : 
Subject                         : 
Creator                         : LaTeX with hyperref package
Producer                        : pdfTeX-1.40.18
Create Date                     : 2019:04:12 18:25:54-07:00
Modify Date                     : 2019:04:12 18:25:54-07:00
Trapped                         : False
PTEX Fullbanner                 : This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) kpathsea version 6.2.3

EXIF Metadata provided by EXIF.tools

Sa TAnn Manual

Navigation menu

Versions of this User Manual:

Views

Navigation