% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/shift_footprints.R
\name{detectRibosomeShifts}
\alias{detectRibosomeShifts}
\title{Detect ribosome shifts}
\usage{
detectRibosomeShifts(
  footprints,
  txdb,
  start = TRUE,
  stop = FALSE,
  top_tx = 10L,
  minFiveUTR = 30L,
  minCDS = 150L,
  minThreeUTR = if (stop) {
     30
 } else NULL,
  txNames = filterTranscripts(txdb, minFiveUTR, minCDS, minThreeUTR),
  firstN = 150L,
  tx = NULL,
  min_reads = 1000,
  min_reads_TIS = 50,
  accepted.lengths = 26:34,
  heatmap = FALSE,
  must.be.periodic = TRUE,
  strict.fft = TRUE,
  verbose = FALSE
)
}
\arguments{
\item{footprints}{\code{\link{GAlignments}} object of RiboSeq reads -
footprints, can also be path to the .bam /.ofst file. If GAlignment object
has a meta column called "score", this will be used as replicate numbering
for that read. So be careful if you have custom files with score columns,
with another meaning.}

\item{txdb}{a TxDb file, a path to one of:
(.gtf ,.gff, .gff2, .gff2, .db or .sqlite)
or an ORFik experiment}

\item{start}{(logical) Whether to include predictions based on the start
codons. Default TRUE.}

\item{stop}{(logical) Whether to include predictions based on the stop
codons. Default FASLE. Only use if there exists 3' UTRs for the annotation.
If peridicity around stop codon is stronger than at the start codon, use
stop instead of start region for p-shifting.}

\item{top_tx}{(integer), default 10. Specify which \% of the top TIS coverage
transcripts to use for estimation of the shifts. By default we take top 10%
top covered transcripts as they represent less noisy data-set. This is only
applicable when there are more than 1000 transcripts.}

\item{minFiveUTR}{(integer) minimum bp for 5' UTR during filtering for the
transcripts. Set to NULL if no 5' UTRs exists for annotation.}

\item{minCDS}{(integer) minimum bp for CDS during filtering for the
transcripts}

\item{minThreeUTR}{(integer) minimum bp for 3' UTR during filtering for the
transcripts. Set to NULL if no 3' UTRs exists for annotation.}

\item{txNames}{a character vector of subset of CDS to use. Default:
txNames = filterTranscripts(txdb, minFiveUTR, minCDS, minThreeUTR)\cr
Example:
c("ENST1000005"), will use only that transcript (You should use at least 100!).
Remember that top_tx argument, will by default specify to use top 10 \%
of those CDSs. Set that to 100, to use all these specified transcripts.}

\item{firstN}{(integer) Represents how many bases of the transcripts
downstream of start codons to use for initial estimation of the
periodicity.}

\item{tx}{a GRangesList, if you do not have 5' UTRs in annotation, send
your own version. Example: extendLeaders(tx, 30)
Where 30 bases will be new "leaders". Since each original transcript was
either only CDS or non-coding (filtered out).}

\item{min_reads}{default (1000), how many reads must a read-length have in total
to be considered for periodicity.}

\item{min_reads_TIS}{default (50), how many reads must a read-length have in the
TIS region to be considered for periodicity.}

\item{accepted.lengths}{accepted read lengths, default 26:34, usually ribo-seq
is strongest between 27:32.}

\item{heatmap}{a logical or character string, default FALSE.
If TRUE, will plot heatmap of
raw reads before p-shifting to console, to see if shifts given make sense.
You can also set a filepath to save the file there.}

\item{must.be.periodic}{logical TRUE, if FALSE will not filter on
periodic read lengths. (The Fourier transform filter will be skipped).
This is useful if you are not going to do periodicity analysis, that is:
for you more coverage depth (more read lengths)
is more important than only keeping the high quality periodic read lengths.}

\item{strict.fft}{logical, TRUE. Use a FFT without noise filter.
This means keep only reads lengths that are "periodic for the human eye".
If you want more coverage, set to FALSE, to also get read lengths
that are "messy", but the noise filter detects the periodicity of 3.
This should only be done when you do not need high quality
periodic reads! Example would be differential translation analysis by
counts over each ORF.}

\item{verbose}{logical, default FALSE.
Report details of analysis/periodogram. Good if you are not sure
if the analysis was correct.}
}
\value{
a data.table with lengths of footprints and their predicted
coresponding offsets
}
\description{
Utilizes periodicity measurement (Fourier transform), and change point
analysis to detect ribosomal footprint shifts for each of the ribosomal
read lengths. Returns subset of read lengths and their shifts for which
top covered transcripts follow periodicity measure. Each shift value
assumes 5' anchoring of the reads, so that output offsets values will
shift 5' anchored footprints to be on the p-site of the ribosome. The
E-site will be shift + 3 and A site will be shift - 3. So update to these,
if you rather want those.
}
\details{
Check out vignette for the examples of plotting RiboSeq metaplots over start
and stop codons, so that you can verify visually whether this function
detects correct shifts.

For how the Fourier transform works, see: \code{\link{isPeriodic}}\cr
For how the changepoint analysis works, see: \code{\link{changePointAnalysis}}\cr

NOTE: It will remove softclips from valid width, the CIGAR 3S30M is qwidth
33, but will remove 3S so final read width is 30 in ORFik. This is standard
for ribo-seq.
}
\examples{
## Basic run
# Transcriptome annotation ->
gtf_file <- system.file("extdata/references/danio_rerio", "annotations.gtf", package = "ORFik")
# Ribo seq data ->
riboSeq_file <- system.file("extdata/Danio_rerio_sample", "ribo-seq.bam", package = "ORFik")
\dontrun{
footprints <- readBam(riboSeq_file)
## Using CDS start site as reference point:
detectRibosomeShifts(footprints, gtf_file)
## Using CDS start site and stop site as 2 reference points:
#detectRibosomeShifts(footprints, gtf_file, stop = TRUE)
## Debug and detailed information for accepted reads lengths and p-site:
detectRibosomeShifts(footprints, gtf_file, heatmap = TRUE, verbose = TRUE)
## Debug why read length 31 was not accepted or wrong p-site:
#detectRibosomeShifts(footprints, gtf_file, must.be.periodic = FALSE,
#              accepted.lengths = 31, heatmap = TRUE, verbose = TRUE)

## Subset bam file
param = ScanBamParam(flag = scanBamFlag(
                       isDuplicate = FALSE,
                       isSecondaryAlignment = FALSE))
footprints <- readBam(riboSeq_file, param = param)
detectRibosomeShifts(footprints, gtf_file, stop = TRUE)

## Without 5' Annotation
library(GenomicFeatures)

txdb <- loadTxdb(gtf_file)
tx <- exonsBy(txdb, by = "tx", use.names = TRUE)
tx <- extendLeaders(tx, 30)
## Now run function, without 5' and 3' UTRs
detectRibosomeShifts(footprints, txdb, start = TRUE, minFiveUTR = NULL,
                     minCDS = 150L, minThreeUTR = NULL, firstN = 150L,
                     tx = tx)

}

}
\references{
https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4912-6
}
\seealso{
Other pshifting: 
\code{\link{changePointAnalysis}()},
\code{\link{shiftFootprints}()},
\code{\link{shiftFootprintsByExperiment}()},
\code{\link{shiftPlots}()},
\code{\link{shifts_load}()},
\code{\link{shifts_save}()}
}
\concept{pshifting}
