% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/getClusterMatches.R
\name{getClusterMatches}
\alias{getClusterMatches}
\alias{countClusterMatches}
\title{Find matches from a PWM cluster within an XStringSet}
\usage{
getClusterMatches(
  cl,
  stringset,
  rc = TRUE,
  min_score = "80\%",
  best_only = FALSE,
  break_ties = c("all", "random", "first", "last", "central"),
  mc.cores = 1,
  ...
)

countClusterMatches(
  cl,
  stringset,
  rc = TRUE,
  min_score = "80\%",
  mc.cores = 1,
  ...
)
}
\arguments{
\item{cl}{A list of Position Weight Matrices, universalmotifs, with each
element representing clusters of related matrices}

\item{stringset}{An XStringSet}

\item{rc}{logical(1) Also find matches using the reverse complement of PWMs
in the cluster}

\item{min_score}{The minimum score to return a match}

\item{best_only}{logical(1) Only return the best match}

\item{break_ties}{Method for breaking ties when only returning the best match
Ignored when all matches are returned (the default)}

\item{mc.cores}{Passed to \link[parallel]{mclapply}}

\item{...}{Passed to \link[Biostrings]{matchPWM}}
}
\value{
Output from getClusterMatches will be a list of DataFrames with columns:
\code{seq}, \code{score}, \code{direction}, \code{start}, \code{end}, \code{from_centre}, \code{seq_width},
\code{motif} and \code{match}

The first three columns describe the sequence with matches, the score of
the match and whether the match was found using the forward or reverse PWM.
The columns \code{start}, \code{end} and \code{width} describe the where the match was found
in the sequence, whilst \code{from_centre} defines the distance between the centre
of the match and the centre of the sequence being queried.
The motif column denotes which individual motif was found to match in this
position, again noting that when matches overlap, only the one with the
highest relative score is returned.
The final column contains the matching fragment of the sequence as an
\code{XStringSet}.

Output from countClusterMatches will be a simple integer vector the same
length as the number of clusters
}
\description{
Find matches from a PWM cluster within a set of sequences
}
\details{
This function extends \link{getPwmMatches} by returning a single set of
results for set of clustered motifs.
This can help remove some of the redundancy in results returned for highly
similar PWMs, such as those in the GATA3 family.

Taking a set of sequences as an XStringSet, find all matches above the
supplied score (i.e. threshold) for a list of Position Weight Matrices
(PWMs), which have been clustered together as highly-related motifs.
By default, matches are performed using the PWMs as provided and the reverse
complement, however this can easily be disabled by setting \code{rc = FALSE}.

The function relies heavily on \link[Biostrings]{matchPWM} and
\link[IRanges]{Views} for speed.

Where overlapping matches are found for the PWMs within a cluster, only a
single match is returned.
The motif with the highest relative score (score / maxScore(PWM)) is selected.

When choosing to return the best match (\code{best_only = TRUE}), only the match
with the highest relative score is returned for each sequence.
Should there be tied scores, the best match can be chosen as either the first,
last, most central, all tied matches, or choosing one at random (the default).
}
\examples{
# Load example PFMs
data("ex_pfm")
# Cluster using default settings
cl_ids <- clusterMotifs(ex_pfm)
ex_cl <- split(ex_pfm, cl_ids)
# Add optional names
names(ex_cl) <- vapply(ex_cl, \(x) paste(names(x), collapse = "/"), character(1))

# Load example sequences
data("ar_er_seq")
# Get all matches for each cluster
getClusterMatches(ex_cl, ar_er_seq)
# Or Just count them
countClusterMatches(ex_cl, ar_er_seq)
# Compare this to individual counts
countPwmMatches(ex_pfm, ar_er_seq)

}
