\name{text2bin}
\alias{text2bin}
\alias{bin2text}
\title{Convert RI files from text to binary format and vice versa}
\description{
    This function converts a list of RI files (also known as peak list files) in text or binary
    format to binary or text format.
}
\usage{
    text2bin(in.files, out.files=NULL, columns=NULL)

    bin2text(in.files, out.files=NULL)
}
\arguments{
    \item{in.files}{A character vector of file paths to the input RI files. }

    \item{out.files}{A character vector of file paths. If \code{NULL}, the input file extensions
        will be changed accordingly ("txt" to "dat" and vice versa).}

    \item{columns}{Either a numeric vector with the positions of the columns for \code{SPECTRUM},
        \code{RETENTION_TIME_INDEX}, and \code{RETENTION_TIME}; or a character vector with the
        respective names of those columns. If \code{NULL}, then default (hard-coded) values will be
        used. If an integer vector, the column position must start at zero. In addition, the column
        names can be set by a global option (see details below).}
}
\details{
    These functions transform a list of RI files from and to binary to text representation. The format
    of the input files is detected dynamically and an error will be issued on invalid files.

    Transforming a binary file to text might be useful if you need to inspect what a RI file looks
    in the inside (for example, you need to check that the peak detection was correct). On the
    other hand, a text file to binary is highly recommended as it is faster to parse than a text
    file.

    For text files, the order of the columns is important (see option \code{columns} above). The first
    entry is the spectrum list, followed by the retention time index and the retention time. If the
    column names are other than \code{SPECTRUM}, \code{RETENTION_TIME_INDEX}, and \code{RETENTION_TIME},
    use the respective column names or the column names positions starting at zero (first column
    is zero, second is one, and so on).

    Many functions relay on those column names and having to pass them as arguments on each function
    is tedious, so the global option \code{TS_RI_columns} can be set at the beginning, for example:

    \preformatted{
        # using column names
        options(TS_RI_columns=c('spec_column', 'RI_column', 'RT_column'))

        # using column indices (zero-based!)
        options(TS_RI_columns=c(1, 2, 0))
    }

    where "spec_column", "RI_column", and "RT_columns" are the names of the spectrum, retention index
    and retention time columns.

    This command is useful if your RI files were generated by another software. However, it
    is highly recommended to simple convert those custom RI files into \code{\link{TargetSearch}}'s binary
    format and do not worry about column names.
}
\section{File Format}{
    The so-called RI files contain lists of m/z peaks detected for every ion trace measured in
    the samples. Historically, the file format was a simple tab-delimited text file in the format
    described below. Note that the column order could differ and additional columns could be
    present, but they are ignored.

    \tabular{lll}{
        RETENTION_TIME \tab SPECTRUM                          \tab RETENTION_TIME_INDEX\cr
        212.46         \tab 250:26 256:26 316:27              \tab 221029.7\cr
        212.51         \tab 114:46 162:30 251:27              \tab 221081.3\cr
        212.56         \tab 319:25                            \tab 221132.9\cr
        212.61         \tab 95:38 108:30 262:32 266:27 292:25 \tab 221184.5
    }

    The retention time is usually represented in seconds, while the retention time in arbitrary
    units, which depends on the retention time correction standard method (in the table above it
    is in milliseconds, but other units can be used).

    The spectrum column is represented by pairs of m/z and raw intensity (peak height), similarly
    as the representation of a metabolite library (see \code{\link{ImportLibrary}}). Thus each
    pair correspond to a peak of the respective ion trace.

    The disadvantage of using text files is they are slow to parse, so a binary format was created
    which represents the peak data as binary vectors so they are fast to parse. These files contain
    the extension \code{dat}.
}
\note{
    Beware that the respective \code{\linkS4class{tsSample}} object may need to be updated by using
    the method \code{\link{fileFormat}}.
}
\value{
    A character vector of the created files paths or invisible.
}
\examples{
    require(TargetSearchData)
    # take three example files from package TargetSearchData
    in.files <- tsd_rifiles()[1:3]
    # out files to current directory
    out.files <- sub(".txt", ".dat", basename(in.files))
    # convert to binary format
    res <- text2bin(in.files, out.files)
    stopifnot(res == out.files)

    # convert back to text
    res <- bin2text(out.files)
    stopifnot(res == basename(in.files))

    # Demonstrate how to use the `columns` option
    # make dummy RI file with arbitrary column names and save it
    tmp <- data.frame(RT=c(101.5,102.5), SPEC=c('12:100 23:100', '114:46 162:30'), RI=c(300, 400) + .75)

    # file must be tab-delimited, unquoted strings and no row names
    RI_test <- tempfile(fileext=".txt")
    write.table(tmp, file=RI_test, sep="\t", quote=FALSE, row.names=FALSE)

    # convert this text file to binary format
    ## wrong! It fails because of invalid columns
    # text2bin(RI_test)

    # correct! The columns are correct
    text2bin(RI_test, columns=c('SPEC', 'RI', 'RT'))

    # same example but using integers (not recommended)
    text2bin(RI_test, columns=c(1, 2, 0)) # note they start from zero.

    # Alternative, set a global option (so it can be used in a session)
    opt <- options(TS_RI_columns=c('SPEC', 'RI', 'RT'))
    text2bin(RI_test)

    # or using integers (again, not recommended)
    options(TS_RI_columns=c(1, 2, 0))
    text2bin(RI_test)

    # unset options
    options(opt)
}
\author{Alvaro Cuadros-Inostroza}
\seealso{\code{\link{ImportSamples}}, \code{\linkS4class{tsSample}},
    \code{\link{RIcorrect}}}

% vim: set ts=4 sw=4 et:
