File Readers¶
Import with:
from mspypeline import BaseReader, MQReader
BaseReader¶
-
class
mspypeline.
BaseReader
(start_dir, reader_config, loglevel=10)¶ - Base reader to provide a data dictionary with keys to the data. Data stored on system hardware, is thus only loaded on demand. This is the parent class of any file reader that will be use to preprocess data to the internal format.
Example
>>> # example for a new custom reader >>> class CustomReader(BaseReader): ... name = "reader" # this is the name of the reader in the yaml file ... required_files = [] # this is a list of strings of all files that should be parsed ... plotter = BasePlotter ... ... def __init__(self, start_dir, reader_config, loglevel): ... super().__init__(start_dir, reader_config, loglevel) ... for file in Reader.required_files: ... self.full_data[file] = [0, 0, 10] # this should be the data from the file >>> r = Reader("", {}, 10)
-
__init__
(start_dir, reader_config, loglevel=10)¶ - Parameters
start_dir (str) – location where the directory/txt folder to the data can be found.
reader_config (dict) – mapping of the file reader configuration (as e.g. given in the config.yml file)
loglevel (int) – level of the logger
-
MQReader¶
-
class
mspypeline.
MQReader
(start_dir, reader_config, index_col='Gene name', duplicate_handling='sum', drop_columns=None, loglevel=10)¶ - A child class of the
BaseReader
.The MQReader preprocesses data from MaxQuant files into the internal data format to provide the correct input for the plotters. Required files to start the MQReader is the proteinGroups.txt file from MaxQuant.Additionally, the file reader can preprocess the evidence, msmsScans, msScans, parameters, peptides and summary txt files from the MaxQuant output.The reader also recognizes sample_mapping.txt files if provided and corrects the sample naming for instance in the case of naming convention violation (see Analysis Design).-
__init__
(start_dir, reader_config, index_col='Gene name', duplicate_handling='sum', drop_columns=None, loglevel=10)¶ - Parameters
start_dir (str) – location where the directory/txt folder to the data can be found.
reader_config (dict) – mapping of the file reader configuration (as e.g. given in the config.yml file)
index_col (str) – with which identification type should detected proteins in the proteinGroups.txt file be handled. If provided in the reader_config will be taken from there.
duplicate_handling (str) – how should proteins with duplicate index_col be treated ? can be “sum” or “drop”. If provided in the reader_config will be taken from there.
drop_columns (Union[list, tuple, str]) – samples to be excluded from the analysis. If provided in the reader_config will be taken from there.
loglevel (int) – level of the logger
-
plotter
¶ alias of
mspypeline.core.MSPPlots.MaxQuantPlotter.MaxQuantPlotter
-
preprocess_contaminants
()¶ - Preprocess the proteinGroups.txt file to internal format and return DataFrame with all those proteins marked as contaminant.Contaminants are defined as those proteins “Only identified by site”, marked as “Reverse” or as “Potential contaminant” in the proteinGroups.txt file.
- Returns
DataFrame containing preprocessed data of contaminants from proteinGroups.txt file
- Return type
DataFrame
-
preprocess_evidence
()¶ - Preprocess the evidence.txt file to internal format and return DataFrame with all those peptides not marked as contaminant.Contaminants are defined as those peptides marked as “Reverse” or as “Potential contaminant” in the evidence.txt file.
- Returns
DataFrame containing preprocessed data from evidence.txt file
- Return type
DataFrame
-
preprocess_msScans
()¶ - Preprocess the msScans.txt file to internal format and return DataFrame.Only columns “Raw file”, “Total ion current” and “Retention time” are read in.
- Returns
DataFrame containing preprocessed data from msScans.txt file
- Return type
DataFrame
-
preprocess_msmsScans
()¶ - Preprocess the msmsScans.txt file to internal format and return DataFrame.Only columns “Raw file”, “Total ion current” and “Retention time” are read in.
- Returns
DataFrame containing preprocessed data from msmsScans.txt file
- Return type
DataFrame
-
preprocess_parameters
()¶ - Preprocess the parameters.txt file to internal format and return DataFrame.
- Returns
DataFrame containing preprocessed data from parameters.txt file
- Return type
DataFrame
-
preprocess_peptides
()¶ - Preprocess the peptides.txt file to internal format and return DataFrame with all those peptides not marked as contaminant.Contaminants are defined as those peptides marked as “Reverse” or as “Potential contaminant” in the peptides.txt file.
- Returns
DataFrame containing preprocessed data from peptides.txt file
- Return type
DataFrame
-
preprocess_proteinGroups
()¶ - Preprocess the proteinGroups.txt file to internal format and return DataFrame with all those proteins not marked as contaminant.Contaminants are defined as those proteins “Only identified by site”, marked as “Reverse” or as “Potential contaminant” in the proteinGroups.txt file.
- Returns
DataFrame containing preprocessed data from proteinGroups.txt file
- Return type
DataFrame
-
preprocess_summary
()¶ - Preprocess the summary.txt file to internal format and return DataFrame.
- Returns
DataFrame containing preprocessed data from summary.txt file
- Return type
DataFrame
-