% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cluster-var.R
\name{cluster_var}
\alias{cluster_var}
\title{Build Hierarchical Tree based on Hierarchical Clustering}
\usage{
cluster_var(x = NULL, d = NULL, block = NULL, method = "average",
  use = "pairwise.complete.obs", sort.parallel = TRUE,
  parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL)
}
\arguments{
\item{x}{a matrix or list of matrices for multiple data sets. The matrix or
matrices have to be of type numeric and are required to have column names
/ variable names. The rows and the columns represent the observations and
the variables, respectively. Either the argument \code{x} or \code{d} has
to be specified.}

\item{d}{a dissimilarity matrix. This can be either a symmetric matrix of
type numeric with column and row names or an object of class
\code{\link{dist}} with labels. Either the argument \code{x} or \code{d} has
to be specified.}

\item{block}{a data frame or matrix specifying the second level of the
hierarchical tree. The first column is required to contain the
variable names and to be of type character. The second column is required to
contain the group assignment and to be a vector of type character or numeric.
If not supplied, the second level is built based on the
data.}

\item{method}{the agglomeration method to be used for the hierarchical
clustering. See \code{\link{hclust}} for details.}

\item{use}{the method to be used for computing covariances in the presence
of missing values. This is important for multiple data sets which do not measure
exactly the same variables. If data is specified using the argument \code{x}, the
dissimilarity matrix for the hierarchical clustering is calculated using
correlation See the 'Details' section and \code{\link{cor}} for all the options.}

\item{sort.parallel}{a logical indicating whether the values are sorted with respect to
the size of the block. This can reduce the run time for parallel computation.}

\item{parallel}{type of parallel computation to be used. See the 'Details' section.}

\item{ncpus}{number of processes to be run in parallel.}

\item{cl}{an optional \strong{parallel} or \strong{snow} cluster used if
\code{parallel = "snow"}. If not supplied, a cluster on the local machine is created.}
}
\value{
The returned value is an object of class \code{"hierD"},
consisting of two elements, the argument \code{"block"} and the
hierarchical tree \code{"res.tree"}.

The element \code{"block"} defines the second level of the hierarchical
tree if supplied.

The element \code{"res.tree"} contains a \code{\link{dendrogram}}
for each of the blocks defined in the argument \code{block}.
If the argument \code{block} is \code{NULL} (i.e. not supplied),
the element contains only one \code{\link{dendrogram}}.
}
\description{
Build a hierarchical tree based on hierarchical clustering of the variables.
}
\details{
The hierarchical tree is built by hierarchical clustering of the variables.
Either the data (using the argument \code{x}) or a dissimilarity matrix
(using the argument \code{d}) can be specified.

If one or multiple data sets are defined using the argument \code{x},
the dissimilarity matrix is calculated by one minus squared empirical
correlation. In the case of multiple data sets, a single hierarchical
tree is jointly estimated using hierarchical clustering. The argument
\code{use} is important because missing values are introduced if the
data sets do not measure exactly the same variables. The argument
\code{use} determines how the empirical correlation is calculated.

Alternatively, it is possible to specify a user-defined dissimilarity
matrix using the argument \code{d}.

If the argument \code{x} and \code{block} are supplied, i.e. the
\code{block} defines the second level of the
hierarchical tree, the function can be run in parallel across
the different blocks by specifying the arguments \code{parallel} and
\code{ncpus}. There is an optional argument \code{cl} if
\code{parallel = "snow"}. There are three possibilities to set the
argument \code{parallel}: \code{parallel = "no"} for serial evaluation
(default), \code{parallel = "multicore"} for parallel evaluation
using forking, and \code{parallel = "snow"} for parallel evaluation
using a parallel socket cluster. It is recommended to select
\code{\link{RNGkind}("L'Ecuyer-CMRG")} and set a seed to ensure that
the parallel computing of the package \code{hierinf} is reproducible.
This way each processor gets a different substream of the pseudo random
number generator stream which makes the results reproducible if the arguments
(as \code{sort.parallel} and \code{ncpus}) remain unchanged. See the vignette
or the reference for more details.
}
\examples{
library(MASS)
x <- mvrnorm(200, mu = rep(0, 500), Sigma = diag(500))
colnames(x) <- paste0("Var", 1:500)
dendr1 <- cluster_var(x = x)

# The column names of the data frame block are optional.
block <- data.frame("var.name" = paste0("Var", 1:500),
                    "block" = rep(c(1, 2), each = 250),
                    stringsAsFactors = FALSE)
dendr2 <- cluster_var(x = x, block = block)

# The matrix x is first transposed because the function dist calculates
# distances between the rows.
d <- dist(t(x))
dendr3 <- cluster_var(d = d, method = "single")

}
\references{
Renaux, C. et al. (2018), Hierarchical inference for genome-wide
association studies: a view on methodology with software. (arXiv:1805.02988)
}
\seealso{
\code{\link{cluster_position}} and
\code{\link{test_hierarchy}}.
}
