\name{getSampleData}
\alias{getSampleData}
\alias{setSampleData}
\title{Reading and writing sample data from / to a tally file}
\description{
These functions allow reading and writing of sample data to the
HDF5-based tally files. The sample data is stored as group attribute.
}
\usage{
getSampleData( filename, group )
setSampleData( filename, group, sampleData, largeAttributes = FALSE, stringSize = 64 )
}
\arguments{
\item{filename}{ The name of a tally file }
\item{group}{ The name of a group within that tally file, e.g. \code{/ExampleStudy/22} }
\item{sampleData}{A \code{data.frame} with \code{k} rows (one for each
  sample) and columns \code{Type}, \code{Column} and (\code{SampleGroup}
  or \code{Patient}. Additional column will be added as well but are not required.)
}
\item{largeAttributes}{HDF5 limits the size of attributes to 64KB, if you have many samples setting this flag will write the attributes in a separate dataset instead. \code{getSampleData} is aware of this and automatically chooses the dataset-stored attributes if they are present}
\item{stringSize}{Maximum length for string attributes (number of characters) - default of 64 characters should be fine for most cases; This has to be specified since we do not support variable length strings as of now.}
}
\details{
The returned data.frame contains information about the sample
ids, sample columns in the sample dimension of the dataset.
The type of sample must be one of \code{c("Case","Control")}
to be used with the provided SNV calling function.
Additional relevant per-sample information may be stored here.

Note that the following columns are required in the sample data where the rows represent samples in the cohort:

\code{Sample}: the sample id of the corresponding sample

\code{Column}: the index within the genomic position dimension of the corresponding sample, be aware that \code{getSampleData} and \code{setSampleData} automatically add / remove \code{1} from this value since internally the tally files store the dimension 0-based whereas within R we count 1-based.

\code{Patient} the patient id of the corresponding sample

\code{Type} the type of sample

}
\value{
\item{sampledata}{A \code{data.frame} with \code{k} rows (one for each
  sample) and columns \code{Type}, \code{Column} and (\code{SampleGroup}
  or \code{Patient}).
}
}
\author{
Paul Pyl
}

\examples{
  # loading library and example data
  library(h5vc)
  # We make a copy of the file to tmp here, this is only needed if we want to keep the original intact.
  tallyFile <- tempfile()
  stopifnot(file.copy(system.file( "extdata", "example.tally.hfs5", package = "h5vcData" ), tallyFile))
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  sampleData
  # modify  the sample data
  sampleData$AnotherColumn <- paste( sampleData$Patient, "Modified" )
  # write to tallyFile
  setSampleData( tallyFile, "/ExampleStudy/16", sampleData )
  # re-load and check if it worked
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  sampleData
}
