Clean files generated by a signal processing tools.

Clean DIAUmpire files

Clean MaxQuant files

Clean OpenMS files

Clean OpenSWATH files

Clean Progenesis files

Clean ProteomeDiscoverer files

Clean Skyline files

Clean SpectroMine files

Clean Spectronaut files

MSstatsClean(msstats_object, ...)

# S4 method for MSstatsDIAUmpireFiles
MSstatsClean(msstats_object, use_frag, use_pept)

# S4 method for MSstatsMaxQuantFiles
MSstatsClean(
  msstats_object,
  protein_id_col,
  remove_by_site = FALSE,
  channel_columns = "Reporterintensitycorrected"
)

# S4 method for MSstatsOpenMSFiles
MSstatsClean(msstats_object)

# S4 method for MSstatsOpenSWATHFiles
MSstatsClean(msstats_object)

# S4 method for MSstatsProgenesisFiles
MSstatsClean(msstats_object, runs, fix_colnames = TRUE)

# S4 method for MSstatsProteomeDiscovererFiles
MSstatsClean(
  msstats_object,
  quantification_column,
  protein_id_column,
  sequence_column,
  remove_shared,
  remove_protein_groups = TRUE,
  intensity_columns_regexp = "Abundance"
)

# S4 method for MSstatsSkylineFiles
MSstatsClean(msstats_object)

# S4 method for MSstatsSpectroMineFiles
MSstatsClean(msstats_object)

# S4 method for MSstatsSpectronautFiles
MSstatsClean(msstats_object, intensity)

Arguments

msstats_object

object that inherits from MSstatsInputFiles class.

...

additional parameter to specific cleaning functions.

use_frag

TRUE will use the selected fragment for each peptide. 'Selected_fragments' column is required.

use_pept

TRUE will use the selected fragment for each protein 'Selected_peptides' column is required.

protein_id_col

character, name of a column with names of proteins.

remove_by_site

logical, if TRUE, proteins only identified by site will be removed.

channel_columns

character, regular expression that identifies channel columns in TMT data.

runs

chr, vector of Run labels.

fix_colnames

lgl, if TRUE, one of the rows will be used as colnames.

quantification_column

chr, name of a column used for quantification.

protein_id_column

chr, name of a column with protein IDs.

sequence_column

chr, name of a column with peptide sequences.

remove_shared

lgl, if TRUE, shared peptides will be removed.

remove_protein_groups

if TRUE, proteins with numProteins > 1 will be removed.

intensity_columns_regexp

regular expressions that defines intensity columns. Defaults to "Abundance", which means that columns that contain the word "Abundance" will be treated as corresponding to intensities for different channels.

intensity

chr, specifies which column will be used for Intensity.

Value

data.table

data.table

data.table

data.table

data.table

data.table

data.table

data.table

data.table

Examples

evidence_path = system.file("tinytest/raw_data/MaxQuant/mq_ev.csv", package = "MSstatsConvert") pg_path = system.file("tinytest/raw_data/MaxQuant/mq_pg.csv", package = "MSstatsConvert") evidence = read.csv(evidence_path) pg = read.csv(pg_path) imported = MSstatsImport(list(evidence = evidence, protein_groups = pg), "MSstats", "MaxQuant")
#> INFO [2021-05-10 23:03:41] ** Raw data from MaxQuant imported successfully.
cleaned_data = MSstatsClean(imported, protein_id_col = "Proteins")
#> INFO [2021-05-10 23:03:41] ** Rows with values of Potentialcontaminant equal to + are removed #> INFO [2021-05-10 23:03:41] ** Rows with values of Reverse equal to + are removed #> INFO [2021-05-10 23:03:41] ** Rows with values of Potentialcontaminant equal to + are removed #> INFO [2021-05-10 23:03:41] ** Rows with values of Reverse equal to + are removed #> INFO [2021-05-10 23:03:41] ** + Contaminant, + Reverse, + Potential.contaminant proteins are removed. #> INFO [2021-05-10 23:03:41] ** Raw data from MaxQuant cleaned successfully.
head(cleaned_data)
#> ProteinName PeptideSequence Modifications PrecursorCharge #> 1: P06959 AEAPAAAPAAK Unmodified 2 #> 2: P06959 AEAPAAAPAAK Unmodified 2 #> 3: P06959 AEAPAAAPAAK Unmodified 2 #> 4: P06959 AEAPAAAPAAK Unmodified 2 #> 5: P06959 AEAPAAAPAAK Unmodified 2 #> 6: P06959 AEAPAAAPAAK Unmodified 2 #> Run Intensity Score #> 1: 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1 4023100 76.332 #> 2: 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2 5132500 83.081 #> 3: 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3 2761600 104.430 #> 4: 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2 4091800 94.465 #> 5: 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3 4727000 88.596 #> 6: 121219_S_CCES_01_08_LysC_Try_1to10_Mixt_3_2 2258400 90.050