# Developer notes

This documents describes the general structure of the package and provides
helpful references to code and files for contributors. Preferably read the
**full** document.


## General info

**What is this package good for?**

- The [Spectra](https://github.com/rformassspectrometry/Spectra) package (and
  the `Spectra` class) provides a powerful infrastructure for mass spectrometry
  (MS) data in R (possibly see the
  [SpectraTutorials](https://jorainer.github.io/SpectraTutorials/) for more
  information, in particular the
  [Spectra-backends](https://jorainer.github.io/SpectraTutorials/articles/Spectra-backends.html)
  vignette for a description of the data structure).

- Powerful MS data algorithms algorithms are also available in Python,
  e.g. provided by the [*matchms*](https://github.com/matchms) and
  [*spectrum_utils*](https://github.com/bittremieux-lab/spectrum_utils) libraries.

- Why re-implement what's already available?

- This package *translates* an R `Spectra` object into Python MS
  data structures and allows you to call similarity scoring and processing/filtering
  functions of the *matchms* package and translate the results back into R data
  objects.


## General package structure

**Where to find what?**

- The *R* folder contains all R source files.

	- *R/conversion.R* contains functions to convert between R and Python data
	  structures (e.g. between `Spectra::Spectra` and `matchms.Spectrum`). The
	  conversion of the Python result into an R data type is handled by R's
	  *reticulate* package, which can convert all basic data types between R and
	  Python.

	- *R/compareSpectriPy.R* contains the mass spectral similarity calculation
	  functions. The core function is the internal
	  [`.compare_spectra_python()`](https://github.com/rformassspectrometry/SpectriPy/blob/main/R/compareSpectriPy.R#L304-L333)
	  function that manages the Anaconda environment, translates the data to
	  Python data structures and calls the Python command using
	  `py_run_string()`. The Python command itself is generated by the
	  `python_command()`
	  (e.g. [this](https://github.com/rformassspectrometry/SpectriPy/blob/main/R/compareSpectriPy.R#L255-L266))
	  command called on the *parameter* object
	  [`CosineGreedyParam`](https://github.com/rformassspectrometry/SpectriPy/blob/main/R/compareSpectriPy.R#L132-L153). To
	  use a new similarity calculation function or a new Python
	  functionality/algorithm, ideally a new *param* object is implemented with
	  the `python_command()` method, which returns the python command that is
	  specific to the new algorithm/Python functionality to run in Python.

- The *tests* folder contains all unit tests. A general *testthat.R* file that
  configures and sets up the tests and a unit test file for each R source file
  (named *test_<R-source-file>.R*) within the *testthat* folder.

- The *vignettes* folder contains an quarto documents that explains the use
  of the SpectriPy package using examples. This is a good starting point to
  explore the package and its functionality.


## Python setup and configuration

**Where are python libraries defined?**

- *SpectriPy* uses the R *reticulate* package for conversion between (basic) R
  and Python data types.

- The *reticulate* `r_to_py()` and `py_to_r()` functions are used for conversion
  of basic data types between R and Python and *vice versa*. To use these
  functions, an Python environment with the *matchms* library must be used.


## Test data

**What data could be used in tests?**

- The package contains two test data files. The "test" and "spectra2" example
  data were created *manually* by defining *m/z* and intensity values of MS
  peaks. Data files can be added (e.g. in MGF format) if needed and put into a
  *inst/extdata* folder.

- Alternatively, example files in mzML format would be available in
  Bioconductor's [*msdata*](https://bioconductor.org/packages/msdata)
  package.

- To test the package and newly created functionality: add the respective unit
  tests to the *tests/testthat* folder and evaluate them e.g. by running
  `rcmdcheck::rcmdcheck(args = "--no-manual")` in an R session started within
  the package folder.


## Potential contributions and extensions

**What could be implemented?**

See the open issues, here are some major topics.

- Integrate other Python libraries? More a discussion - see [issue
  #24](https://github.com/rformassspectrometry/SpectriPy/issues/24).

- Integrate functionality for spectra processing, downstream analysis
  (e.g. cleaning), ... See also [issue 
  #20](https://github.com/rformassspectrometry/SpectriPy/issues/20).

- Ability to translate additional data structures. See also [issue
  #18](https://github.com/rformassspectrometry/SpectriPy/issues/18).

- Define a use case analysis (or ideally several): show how data can be analyzed
  with the *SpectriPy* package using a "quarto" document directly combining the 
  R and Python code: See also [issue
  #21](https://github.com/rformassspectrometry/SpectriPy/issues/21).


## Contributing

**How to contribute?**

- Ideally fork the github repository, implement extensions and make a pull
  request to the *main* branch.

- Follow the [coding style
  guidelines](https://rformassspectrometry.github.io/RforMassSpectrometry/articles/RforMassSpectrometry.html#coding-style)
  and adhere to the [code of
  conduct](https://rformassspectrometry.github.io/RforMassSpectrometry/articles/RforMassSpectrometry.html#code-of-conduct).
