% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/readQFeatures.R
\name{readQFeatures}
\alias{readQFeatures}
\alias{readSummarizedExperiment}
\alias{readQFeatures,data.frame,data.frame}
\alias{readQFeatures,data.frame,vector}
\alias{readQFeatures,missing,vector}
\title{QFeatures from tabular data}
\usage{
readSummarizedExperiment(
  assayData,
  quantCols = NULL,
  fnames = NULL,
  ecol = NULL,
  ...
)

readQFeatures(
  assayData,
  colData = NULL,
  quantCols = NULL,
  runCol = NULL,
  name = "quants",
  removeEmptyCols = FALSE,
  verbose = TRUE,
  ecol = NULL,
  fnames = NULL,
  ...
)
}
\arguments{
\item{assayData}{A \code{data.frame}, or any object that can be coerced
into a \code{data.frame}, holding the quantitative assay. For
\code{readSummarizedExperiment()}, this can also be a
\code{character(1)} pointing to a filename. This \code{data.frame} is
typically generated by an identification and quantification
software, such as Sage, Proteome Discoverer, MaxQuant, ...}

\item{quantCols}{A \code{numeric()}, \code{logical()} or \code{character()}
defining the columns of the \code{assayData} that contain the
quantitative data. This information can also be defined in
\code{colData} (see details).}

\item{fnames}{For the single- and multi-set cases, an optional
\code{character(1)} or \code{numeric(1)} indicating the column to be used as
feature names.  Note that rownames must be unique within \code{QFeatures}
sets. Default is \code{NULL}. See also section 'Feature names'.}

\item{ecol}{Same as \code{quantCols}. Available for backwards
compatibility. Default is \code{NULL}. If both \code{ecol} and \code{colData}
are set, an error is thrown.}

\item{...}{Further arguments that can be passed on to \code{\link[=read.csv]{read.csv()}}
except \code{stringsAsFactors}, which is always \code{FALSE}. Only
applicable to \code{readSummarizedExperiment()}.}

\item{colData}{A \code{data.frame} (or any object that can be coerced
to a \code{data.frame}) containing sample/column annotations,
including \code{quantCols} and \code{runCol} (see details).}

\item{runCol}{For the multi-set case, a \code{numeric(1)} or
\code{character(1)} pointing to the column of \code{assayData} (and
\code{colData}, is set) that contains the runs/batches. Make sure
that the column name in both tables are identical and
syntactically valid (if you supply a \code{character}) or have the
same index (if you supply a \code{numeric}). Note that characters
are converted to syntactically valid names using \code{make.names}}

\item{name}{For the single-set case, an optional \code{character(1)} to
name the set in the \code{QFeatures} object. Default is \code{quants}.}

\item{removeEmptyCols}{A \code{logical(1)}. If \code{TRUE}, quantitative
columns that contain only missing values are removed.}

\item{verbose}{A \code{logical(1)} indicating whether the progress of
the data reading and formatting should be printed to the
console. Default is \code{TRUE}.}
}
\value{
An instance of class \code{QFeatures} or
\code{\link[SummarizedExperiment:SummarizedExperiment-class]{SummarizedExperiment::SummarizedExperiment()}}. For the
former, the quantitative sets of each run are stored in
\code{\link[SummarizedExperiment:SummarizedExperiment-class]{SummarizedExperiment::SummarizedExperiment()}} object.
}
\description{
These functions convert tabular data into dedicated data
objets. The \code{\link[=readSummarizedExperiment]{readSummarizedExperiment()}} function takes a file
name or \code{data.frame} and converts it into a
\code{\link[=SummarizedExperiment]{SummarizedExperiment()}} object.  The \code{\link[=readQFeatures]{readQFeatures()}} function
takes a \code{data.frame} and converts it into a \code{QFeatures} object
(see \code{\link[=QFeatures]{QFeatures()}} for details). For the latter, two use-cases
exist:
\itemize{
\item The single-set case will generate a \code{QFeatures} object with a
single \code{SummarizedExperiment} containing all features of the
input table.
\item The multi-set case will generate a \code{QFeatures} object containing
multiple \code{SummarizedExperiment}s, resulting from splitting the
input table. This multi-set case is generally used when the
input table contains data from multiple runs/batches.
}
}
\details{
The single- and multi-set cases are defined by the \code{quantCols} and
\code{runCol} parameters, whether passed by the \code{quantCols} and
\code{runCol} vectors and/or the \code{colData} \code{data.frame} (see below).
\subsection{Single-set case}{

The quantitative data variables are defined by the \code{quantCols}.
The single-set case can be represented schematically as shown
below.

\if{html}{\out{<div class="sourceCode">}}\preformatted{|------+----------------+-----------|
| cols | quantCols 1..N | more cols |
| .    | ...            | ...       |
| .    | ...            | ...       |
| .    | ...            | ...       |
|------+----------------+-----------|
}\if{html}{\out{</div>}}

Note that every \code{quantCols} column contains data for a single
sample. The single-set case is defined by the absence of any
\code{runCol} input (see next section). We here provide a
(non-exhaustive) list of typical data sets that fall under the
single-set case:
\itemize{
\item Peptide- or protein-level label-free data (bulk or single-cell).
\item Peptide- or protein-level multiplexed (e.g. TMT) data (bulk or
single-cell).
\item PSM-level multiplexed data acquired in a single MS run (bulk or
single-cell).
\item PSM-level data from fractionation experiments, where each
fraction of the same sample was acquired with the same
multiplexing label.
}
}

\subsection{Multi-set case}{

A run/batch variable, \code{runCol}, is required to import multi-set
data. The multi-set case can be represented schematically as shown
below.

\if{html}{\out{<div class="sourceCode">}}\preformatted{|--------+------+----------------+-----------|
| runCol | cols | quantCols 1..N | more cols |
|   1    | .    | ...            | ...       |
|   1    | .    | ...            | ...       |
|--------+------+----------------+-----------|
|   2    | .    | ...            | ...       |
|--------+------+----------------+-----------|
|   .    | .    | ...            | ...       |
|--------+------+----------------+-----------|
}\if{html}{\out{</div>}}

Every \code{quantCols} column contains data for multiple samples
acquired in different runs. The multi-set case applies when
\code{runCol} is provided, which will determine how the table is split
into multiple sets.

We here provide a (non-exhaustive) list of typical data sets that
fall under the multi-set case:
\itemize{
\item PSM- or precursor-level multiplexed data acquired in multiple
runs (bulk or single-cell)
\item PSM- or precursor-level label-free data acquired in multiple
runs (bulk or single-cell)
\item DIA-NN data (see also \code{\link[=readQFeaturesFromDIANN]{readQFeaturesFromDIANN()}}).
}
}

\subsection{Adding sample annotations with \code{colData}}{

We recommend providing sample annotations when creating a
\code{QFeatures} object. The \code{colData} is a table in which each row
corresponds to a sample and each column provides information about
the samples. There is no restriction on the number of columns and
on the type of data they should contain. However, we impose one or
two columns (depending on the use case) that allow to link the
annotations of each sample to its quantitative data:
\itemize{
\item Single-set case: the \code{colData} must contain a column named
\code{quantCols} that provides the names of the columns in
\code{assayData} containing quantitative values for each sample (see
single-set cases in the examples).
\item Multi-set case: the \code{colData} must contain a column named
\code{quantCols} that provides the names of the columns in
\code{assayData} with the quantitative values for each sample, and a
column named \code{runCol} that provides the MS runs/batches in which
each sample has been acquired. The entries in
\code{colData[["runCol"]]} are matched against the entries provided
by \code{assayData[[runCol]]}.
}

When the \code{quantCols} argument is not provided to
\code{readQFeatures()}, the function will automatically determine the
\code{quantCols} from \code{colData[["quantCols"]]}. Therefore, \code{quantCols}
and \code{colData} cannot be both missing.

Samples that are present in \code{assayData} but absent
\code{colData} will lead to a warning, and the missing entries will be
automatically added to the \code{colData} and filled with \code{NA}s.

When using the \code{quantCols} and \code{runCol} arguments only
(without \code{colData}), the \code{colData} contains zero
columns/variables.
}

\subsection{Feature names}{

Assay feature (i.e. rownames) are important as they are used when assays are
joined with \code{\link[=joinAssays]{joinAssays()}}. They can be set upon creation of the
\code{\link[=QFeatures]{QFeatures()}} object by setting the \code{fnames} argument. See also
\code{\link[=createPrecursorId]{createPrecursorId()}} in case a precursor identifier is note readily
available and should be created from other, existing rowData variables.
}
}
\examples{

######################################
## Single-set case.

## Load a data.frame with PSM-level data
data(hlpsms)
hlpsms[1:10, c(1, 2, 10:11, 14, 17)]

## Create a QFeatures object with a single psms set
qf1 <- readQFeatures(hlpsms, quantCols = 1:10, name = "psms")
qf1
colData(qf1)

######################################
## Single-set case with colData.

(coldat <- data.frame(var = rnorm(10),
                      quantCols = names(hlpsms)[1:10]))
qf2 <- readQFeatures(hlpsms, colData = coldat)
qf2
colData(qf2)

######################################
## Multi-set case.

## Let's simulate 3 different files/batches for that same input
## data.frame, and define a colData data.frame.

hlpsms$file <- paste0("File", sample(1:3, nrow(hlpsms), replace = TRUE))
hlpsms[1:10, c(1, 2, 10:11, 14, 17, 29)]

qf3 <- readQFeatures(hlpsms, quantCols = 1:10, runCol = "file")
qf3
colData(qf3)


######################################
## Multi-set case with colData.

(coldat <- data.frame(runCol = rep(paste0("File", 1:3), each = 10),
                      var = rnorm(10),
                      quantCols = names(hlpsms)[1:10]))
qf4 <- readQFeatures(hlpsms, colData = coldat, runCol = "file")
qf4
colData(qf4)
}
\seealso{
\itemize{
\item The \code{QFeatures} (see \code{\link[=QFeatures]{QFeatures()}}) class to read about how to
manipulate the resulting \code{QFeatures} object.
\item The \code{\link[=readQFeaturesFromDIANN]{readQFeaturesFromDIANN()}} function to import DIA-NN
quantitative data.
}
}
\author{
Laurent Gatto, Christophe Vanderaa
}
