% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/metadata.R
\name{get_metadata}
\alias{get_metadata}
\title{Gets the Curated Atlas metadata as a data frame.}
\usage{
get_metadata(
  remote_url = DATABASE_URL,
  cache_directory = get_default_cache_dir(),
  use_cache = TRUE
)
}
\arguments{
\item{remote_url}{Optional character vector of length 1. An HTTP URL pointing
to the location of the parquet database.}

\item{cache_directory}{Optional character vector of length 1. A file path on
your local system to a directory (not a file) that will be used to store
\code{metadata.parquet}}

\item{use_cache}{Optional logical scalar. If \code{TRUE} (the default), and this
function has been called before with the same parameters, then a cached
reference to the table will be returned. If \code{FALSE}, a new connection will
be created no matter what.}
}
\value{
A lazy data.frame subclass containing the metadata. You can interact
with this object using most standard dplyr functions. For string matching,
it is recommended that you use \code{stringr::str_like} to filter character
columns, as \code{stringr::str_match} will not work.
}
\description{
Downloads a parquet database of the Human Cell Atlas metadata to a local
cache, and then opens it as a data frame. It can then be filtered and passed
into \code{\link[=get_single_cell_experiment]{get_single_cell_experiment()}} to obtain a
\code{\link[SingleCellExperiment:SingleCellExperiment]{SingleCellExperiment::SingleCellExperiment}}
}
\details{
The metadata was collected from the Bioconductor package \code{cellxgenedp}. it's
vignette \code{using_cellxgenedp} provides an overview of the columns in the
metadata. The data for which the column \code{organism_name} included "Homo
sapiens" was collected collected from \code{cellxgenedp}.

The columns \code{dataset_id} and \code{file_id} link the datasets explorable through
\code{CuratedAtlasQueryR} and \code{cellxgenedp}to the CELLxGENE portal.

Our representation, harmonises the metadata at dataset, sample and cell
levels, in a unique coherent database table.

Dataset-specific columns (definitions available at cellxgene.cziscience.com)
\code{cell_count}, \code{collection_id}, \code{created_at.x}, \code{created_at.y},
\code{dataset_deployments}, \code{dataset_id}, \code{file_id}, \code{filename}, \code{filetype},
\code{is_primary_data.y}, \code{is_valid}, \code{linked_genesets}, \code{mean_genes_per_cell},
\code{name}, \code{published}, \code{published_at}, \code{revised_at}, \code{revision}, \code{s3_uri},
\code{schema_version}, \code{tombstone}, \code{updated_at.x}, \code{updated_at.y},
\code{user_submitted}, \code{x_normalization}

Sample-specific columns (definitions available at cellxgene.cziscience.com)

\code{sample_}, \code{.sample_name}, \code{age_days}, \code{assay}, \code{assay_ontology_term_id},
\code{development_stage}, \code{development_stage_ontology_term_id}, \code{ethnicity},
\code{ethnicity_ontology_term_id}, \code{experiment___}, \code{organism},
\code{organism_ontology_term_id}, \code{sample_placeholder}, \code{sex},
\code{sex_ontology_term_id}, \code{tissue}, \code{tissue_harmonised},
\code{tissue_ontology_term_id}, \code{disease}, \code{disease_ontology_term_id},
\code{is_primary_data.x}

Cell-specific columns (definitions available at cellxgene.cziscience.com)

\code{cell_}, \code{cell_type}, \code{cell_type_ontology_term_idm}, \code{cell_type_harmonised},
\code{confidence_class}, \code{cell_annotation_azimuth_l2},
\code{cell_annotation_blueprint_singler}

Through harmonisation and curation we introduced custom column, not present
in the original CELLxGENE metadata
\itemize{
\item \code{tissue_harmonised}: a coarser tissue name for better filtering
\item \code{age_days}: the number of days corresponding to the age
\item \code{cell_type_harmonised}: the consensus call identity (for immune cells)
using the original and three novel annotations using Seurat Azimuth and
SingleR
\item \code{confidence_class}: an ordinal class of how confident
\code{cell_type_harmonised} is. 1 is complete consensus, 2 is 3 out of four and
so on.
\item \code{cell_annotation_azimuth_l2}: Azimuth cell annotation
\item \code{cell_annotation_blueprint_singler}: SingleR cell annotation using
Blueprint reference
\item \code{cell_annotation_blueprint_monaco}: SingleR cell annotation using Monaco
reference
\item \code{sample_id_db}: Sample subdivision for internal use
\item \code{file_id_db}: File subdivision for internal use
\item \code{sample_}: Sample ID
\item \code{.sample_name}: How samples were defined
}

\strong{Possible cache path issues}

If your default R cache path includes non-standard characters (e.g. dash
because of your user or organisation name), the following error can manifest

Error in \code{db_query_fields.DBIConnection()}: ! Can't query fields. Caused by
error: ! Parser Error: syntax error at or near "/" LINE 2: FROM
/Users/bob/Library/Caches...

The solution is to choose a different cache, for example

get_metadata(cache_directory = path.expand('~'))
}
\examples{
library(dplyr)
filtered_metadata <- get_metadata() |>
    filter(
        ethnicity == "African" &
            assay \%LIKE\% "\%10x\%" &
            tissue == "lung parenchyma" &
            cell_type \%LIKE\% "\%CD4\%"
    )

}
