MSstatsPreprocess.RdPreprocess outputs from MS signal processing tools for analysis with MSstats
MSstatsPreprocess( input, annotation, feature_columns, remove_shared_peptides = TRUE, remove_single_feature_proteins = TRUE, feature_cleaning = list(remove_features_with_few_measurements = TRUE, summarize_multiple_psms = max), score_filtering = list(), exact_filtering = list(), pattern_filtering = list(), columns_to_fill = list(), aggregate_isotopic = FALSE, ... )
| input | data.table processed by the MSstatsClean function. |
|---|---|
| annotation | annotation file generated by a signal processing tool. |
| feature_columns | character vector of names of columns that define spectral features. |
| remove_shared_peptides | logical, if TRUE shared peptides will be removed. |
| remove_single_feature_proteins | logical, if TRUE, proteins that only have one feature will be removed. |
| feature_cleaning | named list with maximum two (for |
| score_filtering | a list of named lists that specify filtering options. Details are provided in the vignette. |
| exact_filtering | a list of named lists that specify filtering options. Details are provided in the vignette. |
| pattern_filtering | a list of named lists that specify filtering options. Details are provided in the vignette. |
| columns_to_fill | a named list of scalars. If provided, columns with
names defined by the names of this list and values corresponding to its elements
will be added to the output |
| aggregate_isotopic | logical. If |
| ... | additional parameters to |
data.table
evidence_path = system.file("tinytest/raw_data/MaxQuant/mq_ev.csv", package = "MSstatsConvert") pg_path = system.file("tinytest/raw_data/MaxQuant/mq_pg.csv", package = "MSstatsConvert") evidence = read.csv(evidence_path) pg = read.csv(pg_path) imported = MSstatsImport(list(evidence = evidence, protein_groups = pg), "MSstats", "MaxQuant")#> INFO [2021-05-10 23:03:42] ** Raw data from MaxQuant imported successfully.#> INFO [2021-05-10 23:03:42] ** Rows with values of Potentialcontaminant equal to + are removed #> INFO [2021-05-10 23:03:42] ** Rows with values of Reverse equal to + are removed #> INFO [2021-05-10 23:03:42] ** Rows with values of Potentialcontaminant equal to + are removed #> INFO [2021-05-10 23:03:42] ** Rows with values of Reverse equal to + are removed #> INFO [2021-05-10 23:03:42] ** + Contaminant, + Reverse, + Potential.contaminant proteins are removed. #> INFO [2021-05-10 23:03:42] ** Raw data from MaxQuant cleaned successfully.annot_path = system.file("tinytest/raw_data/MaxQuant/annotation.csv", package = "MSstatsConvert") mq_annot = MSstatsMakeAnnotation(cleaned_data, read.csv(annot_path), Run = "Rawfile")#> INFO [2021-05-10 23:03:42] ** Using provided annotation. #> INFO [2021-05-10 23:03:42] ** Run labels were standardized to remove symbols such as '.' or '%'.# To filter M-peptides and oxidatin peptides m_filter = list(col_name = "PeptideSequence", pattern = "M", filter = TRUE, drop_column = FALSE) oxidation_filter = list(col_name = "Modifications", pattern = "Oxidation", filter = TRUE, drop_column = TRUE) msstats_format = MSstatsPreprocess( cleaned_data, mq_annot, feature_columns = c("PeptideSequence", "PrecursorCharge"), columns_to_fill = list(FragmentIon = NA, ProductCharge = NA), pattern_filtering = list(oxidation = oxidation_filter, m = m_filter) )#> INFO [2021-05-10 23:03:42] ** The following options are used: #> - Features will be defined by the columns: PeptideSequence, PrecursorCharge #> - Shared peptides will be removed. #> - Proteins with a single feature will be removed. #> - Features with less than 3 measurements across runs will be removed. #> INFO [2021-05-10 23:03:42] ** Sequences containing Oxidation are removed. #> INFO [2021-05-10 23:03:42] ** Sequences containing M are removed. #> INFO [2021-05-10 23:03:42] ** Features with all missing measurements across runs are removed. #> INFO [2021-05-10 23:03:42] ** Shared peptides are removed. #> INFO [2021-05-10 23:03:42] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max #> INFO [2021-05-10 23:03:42] ** Features with one or two measurements across runs are removed. #> INFO [2021-05-10 23:03:42] Proteins with a single feature are removed. #> INFO [2021-05-10 23:03:42] ** Run annotation merged with quantification data.#> Run PeptideSequence PrecursorCharge #> 1: 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1 AEAPAAAPAAK 2 #> 2: 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2 AEAPAAAPAAK 2 #> 3: 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3 AEAPAAAPAAK 2 #> 4: 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2 AEAPAAAPAAK 2 #> 5: 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3 AEAPAAAPAAK 2 #> 6: 121219_S_CCES_01_08_LysC_Try_1to10_Mixt_3_2 AEAPAAAPAAK 2 #> Intensity ProteinName Condition BioReplicate Experiment IsotopeLabelType #> 1: 4023100 P06959 1 1 1_1 L #> 2: 5132500 P06959 1 1 1_2 L #> 3: 2761600 P06959 1 1 1_3 L #> 4: 4091800 P06959 2 2 2_2 L #> 5: 4727000 P06959 2 2 2_3 L #> 6: 2258400 P06959 3 3 3_2 L #> FragmentIon ProductCharge #> 1: NA NA #> 2: NA NA #> 3: NA NA #> 4: NA NA #> 5: NA NA #> 6: NA NA