MSPPlots¶
import with:
from mspypeline import BasePlotter, MaxQuantPlotter
BasePlotter¶
-
class
mspypeline.
BasePlotter
(start_dir, reader_data=None, intensity_df_name='', interesting_proteins=None, go_analysis_gene_names=None, configs=None, required_reader=None, intensity_entries=(), loglevel=10)¶ - Base plotter to create plots.The two main methods of the Base plotter comprise “get_” functions to calculate and provide the data for the “plot_” functions. The latter incorporates the “get_” functions as well as functions from the matplotlib backend to combine data calculation, plotting and saving of the results in one method.
-
__init__
(start_dir, reader_data=None, intensity_df_name='', interesting_proteins=None, go_analysis_gene_names=None, configs=None, required_reader=None, intensity_entries=(), loglevel=10)¶ - Parameters
start_dir (str) – location to save results
reader_data (Optional[Dict[str, Dict[str, pandas.core.frame.DataFrame]]]) – mapping to provide input data
intensity_df_name (str) – name/key to input data
interesting_proteins (Optional[Dict[str, pandas.core.series.Series]]) – mapping with pathway proteins to analyze
go_analysis_gene_names (Optional[Dict[str, pandas.core.series.Series]]) – mapping with go terms to analyze
configs (Optional[dict]) – mapping of configuration
required_reader (Optional[str]) – name of the file reader
intensity_entries (Tuple[str, str, str]) – tuple of (key in all_tree_dict, prefix in data, name in plot). See
add_intensity_column()
.loglevel (int) – level of the logger
-
add_intensity_column
(option_name, name_in_file, name_in_plot, scale='normal', df=None)¶ - Adds two options to all_intensities_dict and all_tree_dict, called option_name and option_name_log2.
- Parameters
option_name (str) – the name that the added data has internally, can be referred to via the df_to_use option e.g. lfq or ibaq
name_in_file (str) – prefix of the columns e.g. Intensity or LFQ intensity
name_in_plot (str) – shown name in the plots e.g. LFQ Intensity or “iBAQ values”
scale (str) – is the data in “normal” or in “log2” scale
df (Optional[pandas.core.frame.DataFrame]) – can be passed to use instead of
BasePlotter.intensity_df
-
add_normalized_option
(df_to_use, normalizer, norm_option_name)¶ - Adds a new option/key of available data sets in all_intensities_dict and all_tree_dict by taking the data set all_intensities_dict[df_to_use], performing the normalization on the data and then adding the new option with
add_intensity_column()
.- Parameters
df_to_use (str) – data set that should be normalized
normalizer (Union[Type[mspypeline.modules.Normalization.BaseNormalizer], Any]) – normalizer either derived from
BaseNormalizer
or a class with afit_transform()
norm_option_name (str) – suffix of the new option name
-
create_results
()¶ Creates all plots that where chosen/set to True in the settings “create plot” (see Analysis settings).
-
classmethod
from_MSPInitializer
(mspinit_instance, **kwargs)¶ - Creates a BasePlotter from a
MSPInitializer
.- Parameters
mspinit_instance (mspypeline.core.MSPInitializer.MSPInitializer) – instance of a
MSPInitializer
used to get correct inputs for the plotter.kwargs – all kwargs, which are passed to the
BasePlotter.__init__()
can be overwritten by passing as kwargs.
- Returns
functional plotter
- Return type
-
classmethod
from_file_reader
(reader_instance, **kwargs)¶ - Creates a BasePlotter from a
BaseReader
(BasePlotter or MaxQuantPlotter).- Parameters
reader_instance (mspypeline.file_reader.BaseReader.BaseReader) – instance of a
BaseReader
used to get correct inputs for the plotter.kwargs – all kwargs, which are passed to the
BasePlotter.__init__()
can be overwritten by passing as kwargs.
- Returns
functional plotter
- Return type
-
get_boxplot_data
(df_to_use, level, **kwargs)¶ - Get protein intensities for all samples per group of the selected level and then sorts samples by their median intensity.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
level (int) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
Dictionary with key “protein_intensities” to a DataFrame containing the protein intensities per group sorted by median intensity
- Return type
Dict
-
get_detected_proteins_per_replicate_data
(df_to_use, level, **kwargs)¶ - Counts the number of protein intensity values greater than 0 (number of detected proteins) per sample of a group from the selected level.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
level (int) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
Dictionary with key “all_height” to a mapping of protein counts as Series per group
- Return type
Dict
-
get_detection_counts_data
(df_to_use, level, **kwargs)¶ - Counts the number of intensity values greater than 0 per protein (number of samples that the protein is detected in) per group of the selected level.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
level (int) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
Dictionary with key “counts” to a DataFrame containing the counts of proteins detected in a sample
- Return type
Dict
-
get_experiment_comparison_data
(df_to_use, full_name1, full_name2)¶ - Gets protein intensities for all samples of a given group, then calculates the proteins that can be compared between groups and those that are unique for each group (see Thresholds and Comparisons) and takes the mean intensity of these proteins.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
full_name1 (str) – name of the first data node/group that should be compared to ‘full_name2’
full_name2 (str) – name of the second data node/group that should be compared to ‘full_name1’
- Returns
Dictionary with keys “protein_intensities_sample1” and “protein_intensities_sample2” to Series containing the mean protein intensities of sample 1 and sample 2 and “exclusive_sample1” and “exclusive_sample2” to Series containing the mean intensities of unique proteins for sample 1 and sample 2.
- Return type
Dict
-
get_go_analysis_data
(df_to_use, level)¶ - Calculates an enrichment analysis for all samples per group of the selected level and for each given GO list (see
plot_go_analysis()
). Significances are calculated with a fisher exact test.- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
level (int) – at which level of the data tree should the data be compared
- Returns
Dictionary with keys “heights” to a ddict containing the counts of proteins per sample of each given GO list, “test_results” to a ddict containing the corresponding Fisher’s exact test results and “go_length” to a list containing the total number of proteins of each chosen GO list
- Return type
Dict
-
get_intensity_heatmap_data
(df_to_use, level, sort_index=False, sort_index_by_missing=True, sort_columns_by_missing=True, **kwargs)¶ - Get the protein intensities for all samples per group of the selected level and sorts samples and proteins according to settings.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
level (int) – at which level of the data tree should the data be compared
sort_index (bool) – should proteins be sorted alphanumerically
sort_index_by_missing (bool) – should proteins be sorted by number of missing values across samples
sort_columns_by_missing (bool) – should samples be sorted by number of missing values
kwargs – accepts kwargs
- Returns
Dictionary with key “intensities” to a DataFrame containing protein intensities of samples
- Return type
Dict
-
get_intensity_histograms_data
(df_to_use, level, **kwargs)¶ - Get protein intensity values for each sample per group of the selected level.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
level (int) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
Dictionary with key “hist_data” to a DataFrame containing the protein intensity values per group
- Return type
Dict
-
get_kde_data
(df_to_use, level, **kwargs)¶ - Gets the protein intensities for all samples per group of the selected level.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
level (int) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
Dictionary with key “intensities” to a DataFrame containing the protein intensities per group
- Return type
Dict
-
get_n_protein_vs_quantile_data
(df_to_use, level, quantile_range=None, **kwargs)¶ - Gets protein intensities for all samples per group, counts the number of intensity values greater than 0 (total number of detected proteins) and the quantiles per sample.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
level (int) – at which level of the data tree should the data be compared
quantile_range (Optional[numpy.array]) – which quantile range should be used for analysis
kwargs – accepts kwargs
- Returns
Dictionary with keys “quantiles” to a DataFrame of calculated quantiles per sample and “n_proteins” to a Series of total number of identified proteins per sample
- Return type
Dict
-
get_pathway_analysis_data
(df_to_use, level, pathway, equal_var=True, **kwargs)¶ - Filters out all proteins of the given pathways for all samples per group of the selected level, then calculates the pairwise significances between the groups with an independent t-test (see
plot_pathway_analysis()
) for all those proteins that can be compared (see Thresholds and Comparisons).- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
level (int) – at which level of the data tree should the data be compared
pathway (str) – which pathway should be analysed
equal_var – should equal variance be assumed
kwargs – accepts kwargs
- Returns
Dictionary with keys “protein_intensities” to a DataFrame containing the protein intensities of detected proteins from all given pathways per group and “significances” to a DataFrame containing the calculated significances between groups for each protein of all given pathways
- Return type
Dict
-
get_pca_data
(df_to_use, level, n_components=2, fill_value=0, no_missing_values=True, fill_na_before_norm=False, **kwargs)¶ - Gets protein intensities for all samples per group processes data according to given arguments and then performs a dimensionality reduction (PCA) using
sklearn.decomposition.PCA
.- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
level (int) – at which level of the data tree should the data be compared
n_components (int) – how many principal components should be calculated
fill_value (float) – if data should be interpolated, which fill value should be used
no_missing_values (bool) – should missing values be neglected
fill_na_before_norm (bool) – if data should be interpolated, should this be done before normalisation
kwargs – accepts kwargs
- Returns
Dictionary with keys “pca_data” to a DataFrame containing the output of a PCA using `
sklearn.decomposition
and “pca_fit” to a PCA object that was fitted to normalized input data- Return type
Dict
-
get_r_volcano_data
(g1, g2, df_to_use)¶ - Gets the protein intensities for all samples of the two given groups, then calculates the proteins that can be compared between groups and those unique for each group (see Thresholds and Comparisons).Hands over the protein intensities to be compared to the R package
limma
that outputs the logFC, p-value, adjusted p value (Benjamini + Hochberg) and other data which is calculated based on a moderated t-statistic. P-value calculations are corrected for the intensity-variance relationship.Results are converted back to python format afterwards.Note
This function uses the R package limma which is automatically downloaded the first time this analysis is performed.
- Parameters
g1 (str) – first sample that should be analysed (downregulated)
g2 (str) – second sample that should be analysed (upregulated)
df_to_use (str) – which dataframes/intensities should be analysed
- Returns
Dictionary with keys “volcano_data” to a DataFrame containing processed output of the
limma.eBayes
analysis, “unique_g1” and “unique_g2” to Series containing the unique protein intensities per group- Return type
Dict
-
get_rank_data
(df_to_use, full_name, **kwargs)¶ - Get protein intensity values of the selected group and rank the proteins by their intensity value.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
full_name (str) – which data node/group of samples should be compared
- Returns
Dictionary with key “rank_data” to Series containing the protein intensities of the group ranked by intensity value
- Return type
Dict
-
get_relative_std_data
(df_to_use, full_name, **kwargs)¶ - Calculate which proteins of a group can be used for the analysis (see Thresholds and Comparisons) and filters proteins below the threshold out.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
full_name (str) – which data node/group of samples should be compared
- Returns
Dictionary with key “intensities” to a DataFrame containing the protein intensities of the group
- Return type
Dict
-
get_scatter_replicates_data
(df_to_use, full_name)¶ - Get protein intensity values for each sample of a selected group.
- Parameters
df_to_use (str) – which dataframes/intensities should be analysed
full_name (str) – which data node/group of samples should be compared
- Returns
Dictionary with key “scatter_data” to a DataFrame containing the protein intensity values per replicate
- Return type
Dict
-
get_venn_data_per_key
(df_to_use, key)¶ - Counts the protein intensity values greater than 0 (number of detected proteins) for each replicate of a group from the selected level.
- Parameters
df_to_use (str) – which dataframe/intensity should be analysed
key (str) – which data node/group of samples should be compared
- Returns
Dictionary containing the proteins detected per sample
- Return type
Dict
-
get_venn_group_data
(df_to_use, level, non_na_function=<function get_number_of_non_na_values>)¶ - Calculates which proteins can be compared between groups or are unique for a group of the selected level (see Thresholds and Comparisons) and then counts these proteins per group.
- Parameters
df_to_use (str) – which dataframe/intensity should be analysed
level (int) – at which level of the data tree should the data be compared
non_na_function – threshold function to determine if proteins can be compared, default:
get_number_of_non_na_values()
- Returns
Dictionary containing the proteins that can be compared per group
- Return type
Dict
-
plot_all_normalizer_overview
(dfs_to_use, levels, plot_function, file_name, normalizers=None, **kwargs)¶ - Helper method to create a multi-paged file containing one plot per normalization option.For overview of plots see analysis optionsFor exemplary plot see gallery
- Parameters
dfs_to_use – which dataframes/intensities should be plotted
levels – at which level of the data tree should the data be compared
plot_function – which plot should be created
file_name – name of the file that is crated and saved
normalizers – normalizers either derived from
BaseNormalizer
or a class with afit_transform()
kwargs – accepts kwargs
- Returns
A list of all created plots
- Return type
list
-
plot_boxplot
(dfs_to_use, levels, **kwargs)¶ - A standard boxplot displaying the five quantile distribution per group of the selected level and ranking the groups by median intensity from the bottom of the graph to the top.The plot is created by applying
get_pca_data()
to get protein intensities for all samples per group of the selected level and the sort samples by their median intensity. Data is plotted and saved usingsave_boxplot_results()
The boxplot is part of the Normalization overview.For overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_detected_proteins_per_replicate
(dfs_to_use, levels, **kwargs)¶ - Uses
get_detected_proteins_per_replicate_data()
to count the number of protein intensity values greater than 0 (number of detected proteins) per sample of a group from the selected level.The data is plotted and saved usingsave_detected_proteins_per_replicate_results()
as bar diagram showing the number of detected proteins per sample as well as the total number of detected proteins for each group of a selected level.The average number of detected proteins per group is indicated as gray dashed line.To view adjustable parameters see “plot_detected_proteins_per_replicate_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
-
plot_detection_counts
(dfs_to_use, levels, **kwargs)¶ - Uses
get_detection_counts_data()
to count the number of intensity values > 0 per protein (number of samples that the protein is detected in) per group of the selected level.The data is plotted and saved usingsave_detection_counts_results()
as a bar diagram showing how often proteins are detected in a number of samples/replicates for each group.To view adjustable parameters see “plot_detection_counts_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_experiment_comparison
(dfs_to_use, levels, **kwargs)¶ - To generate the experiment comparison plot, the function
get_experiment_comparison_data()
is used to retrieve protein intensity values for all samples of a given group and to classify those proteins that can be compared between groups and those that are unique for each group (see Thresholds and Comparisons). Then the the mean intensity of these proteins is calculated.For all groups of the selected level, pairwise comparisons of the protein intensities are plotted and their Pearson’s correlation coefficient r^2 is calculated.Unique proteins per group are shown at the bottom and right side of the graph (substitution of missing values by the minimum value of the data set).The calculated Pearson’s correlation coefficient r^2 is additionally visualized in form of a correlation heatmap.For every pairwise comparison of the groups from the selected level, one scatter plot is created and the results of all pairwise comparisons together are visualized in one combined correlation heatmap usingsave_scatter_replicates_results()
.To view adjustable parameters see “plot_experiment_comparison_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see galleryNote
To determine which proteins can be compared between the two groups and which are unique for one group an internal threshold function is applied.
- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_go_analysis
(dfs_to_use, levels, **kwargs)¶ - In the GO analysis, an enrichment analysis is performed for each selected GO Term file (based on protein counts = proteins with intensity value > 0). For this analysis
get_go_analysis_data()
is used to calculate the number of detected proteins from a GO term that are found in each group of the selected level. The data is illustrated as the length of the corresponding bar. P values shown at the end of a bar indicate the calculated significance. Samples referred to as “Total” represent the complete data set and numbers at the top of the graph accord to the count of detected proteins in all samples over the total number of proteins in the GO term. The data of all chosen pathways is plotted and saved in one graph usingsave_go_analysis_results()
For p-value calculation, first, for each GO term, a list “pathway_genes” is created by taking the intersection of the proteins from the GO list and the total detected proteins.Secondly, a list of “non_pathway_genes” is created which comprises total detected proteins but proteins in “pathway_genes”.Third, a list of “experiment_genes” and “non_experiment_genes” is created in a similar fashion where an experiment references to a sample/group of samples of the data set.Lastly, a one-tailed fisher exact test is calculated to retrieve statistical significances based on the following contingency table:in pathway
not in pathway
in experiment
experiment_genes & pathway_genes
experiment_genes & not_pathway_genes
not in experiment
not_experiment_genes & pathway_genes
not_experiment_genes & not_pathway_genes
The resulting p-value is thus, also dependent on the overall protein count of the sample/group of samples. A sample is considered significant if the p value is > 0.05.To view adjustable parameters see “plot_go_analysis_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_heatmap_overview_all_normalizers
(dfs_to_use, levels, **kwargs)¶ - Creates the intensity heatmap overview for all normalization methods.The intensity heatmap demonstrates protein intensities, where samples are given in rows on the y axis and proteins on the x axis. Missing values are colored in gray.The heatmap can be used to spot patterns in the different normalization methods and to understand how different intensity types affect the data.To view adjustable parameters see “plot_heatmap_overview_all_normalizers_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see gallery
- Parameters
dfs_to_use – which dataframes/intensities should be plotted
levels – at which level of the data tree should the data be compared
kwargs – accepts kwargs
-
plot_intensity_heatmap
(dfs_to_use, levels, **kwargs)¶ - The intensity heatmap demonstrates protein intensities (derived from
get_intensity_heatmap_data()
), where samples are given in rows on the y axis and proteins on the x axis. Missing values are colored in gray. The data is plotted and saved usingsave_intensities_heatmap_result()
.The heatmap can be used to spot patterns in the different normalization methods and to understand how different intensity types affect the data.The Heatmap overview is created from a series of intensity heatmap plots.For overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_intensity_histograms
(dfs_to_use, levels, **kwargs)¶ - Uses
get_intensity_histograms_data()
to get protein intensity values for each sample per group of the selected level.The intensity values of each sample are binned (default = 25) and the data of each sample from a group of the selected level is plotted and saved in one histogram usingsave_intensity_histogram_results()
.If the parameter “show_mean” is set to True in the configs the mean intensity of the plotted samples of a group is shown as gray dashed line.To view adjustable parameters see “plot_intensity_histograms_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_kde
(dfs_to_use, levels, **kwargs)¶ - In the kernel density estimate (KDE) plot, one density graph per sample is plotted indicating the intensity (derived from
get_kde_data()
) on the x axis and the density on the y axis. The data is plotted and saved usingsave_kde_results()
.These plots should be presented on a log2 scale.The KDE is well suited to study the influence of different normalization methods and protein intensities on the data which is why it is part if the Normalization overview.For overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_n_proteins_vs_quantile
(dfs_to_use, levels, **kwargs)¶ - Plots the quantile protein intensities against the number of identified proteins per sample.
get_n_protein_vs_quantile_data()
is used to get protein intensities for all samples per group and subsequently count the number of intensity values > 0 (total number of detected proteins) and the quantiles per sample. The data is visualized and saved bysave_n_proteins_vs_quantile_results()
.Samples are indicated as a horizontal line of scatter dots where the color anf x position of a dot indicate the intensity value of the respective quantile. The y position of the dots of a sample point to the total number of detected proteins in that sample.Solid, rather vertical lines indicate a linear fit of each quantile for all the samples.This plot is part of the Normalization overview.For overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
-
plot_normalization_overview
(dfs_to_use, levels, **kwargs)¶ - The Normalization overview offers the opportunity to examine different aspects of the data in three distinct plots. For each normalization method provided an additional page is attached to the resulting pdf file starting with the raw or not normalized data. That way it is possible to get a better understanding of the effects of the normalization methods on the data, to inspect the different approaches and to find the best suitable normalization for the data.The normalization overview combines the plots
plot_kde()
(see KDE example),plot_n_proteins_vs_quantile()
(see proteins vs quantiles example) andplot_boxplot()
(see boxplot example).To view adjustable parameters see “plot_normalization_overview_all_normalizers_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
-
plot_normalization_overview_all_normalizers
(dfs_to_use, levels, **kwargs)¶ - Creates the
plot_normalization_overview()
for all normalization methods.To view adjustable parameters see “plot_normalization_overview_all_normalizers_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use – which dataframes/intensities should be plotted
levels – at which level of the data tree should the data be compared
kwargs – accepts kwargs
-
plot_pathway_analysis
(dfs_to_use, levels, **kwargs)¶ - In the pathway analysis, for each protein of a desired pathway a subplot is created displaying the intensities of the protein for all groups of the selected level.First,
get_pathway_analysis_data()
is used to filter out all proteins of the desired pathways for all samples per group of the selected level. The function then determines which of those proteins can be compared between samples (see Thresholds and Comparisons) and significances of these protein intensities are calculated for each pairwise comparison between groups with an independent t-test. P value thresholds are set to the following: * is p < 0.05, ** is p < 0.005, and *** is p < 0.0005. For every selected pathway, two figures are created and saved usingsave_pathway_analysis_results()
, one displaying the significances and the other not displaying them.For a group of multiple samples, the protein intensity is plotted for each sample (single scatter dot) which are jointly presented in uniform coloring.To view adjustable parameters see “plot_pathway_analysis_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see galleryNote
To determine which proteins can be compared between two groups an internal threshold function is applied.
- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_pathway_timecourse
(df_to_use='raw', show_suptitle=False, levels=2, **kwargs)¶ not yet implemented
- Parameters
df_to_use (str) –
show_suptitle (bool) –
levels (Iterable) –
-
plot_pca_overview
(dfs_to_use, levels, **kwargs)¶ - With the option to perform PCA, data can be studied for its variance and in doing so, parameters can be determined that have most strongly affected the variability between samples. The created PCA compares all components against each other (default = 2 components).PCA results are calculated using
get_pca_data()
that gets protein intensities for all samples per group, processes data according to the given arguments, and then performs a dimensionality reduction (PCA) usingsklearn.decomposition.PCA
. Multiple different analysis options can be chosen to generate a PCA (see: multiple option config).The results do not change in dependence on the chosen level, however, determining the level on which the data should be compared influences the coloring of the scatter elements. Each group of the selected level is colored differently. The data is plotted and saved usingsave_pca_results()
.To view adjustable parameters see “plot_pca_overview_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see galleryplot_pca_overview_settings:
- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_r_volcano
(dfs_to_use, levels, sample1=None, sample2=None, **kwargs)¶ - A volcano plot illustrates the statistical inferences from a pairwise comparison of the two groups.The plot shows the log2 fold change between two different conditions against the -log10(p-value) (based on protein intensities). The p-value and adjusted p-value ((Benjamini + Hochberg) are determined using the R limma package (moderated t-statistic). Additionally, calculations are corrected for the intensity-variance relationship. For the calculation of all these parameters
get_r_volcano_data()
is applied.Dashed lines indicate the fold change cutoff (default = log2(2) and p-value cutoff (default = p < 0.05) by which proteins are considered significant (blue and red) or non significant (gray). Measured intensities of unique proteins are indicated at the sides of the volcano plot for each groups (light blue and orange).Volcano plots also permit the annotation of mapped proteins. This can be achieved by labeling a number of the most significant proteins for each group or by selecting a pathway analysis protein list.For every pairwise comparison of the groups of the selected level two volcano plots are created and saved, using :func:`~mspypeline.plotting_backend.matplotlib_plots.save_volcano_results’, where one plot has a set of proteins annotated and the other does not.To view adjustable parameters see “plot_r_volcano_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see galleryNote
should be used with log2 intensities
minimum of 3 samples per group required
Note
To determine which proteins can be compared between the two groups and which are unique for one group an internal threshold function is applied.
- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
sample1 (str) – first sample that should be compared (downregulated)
sample2 (str) – second sample that should be compared (upregulated)
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
-
plot_rank
(dfs_to_use, levels, **kwargs)¶ - In the rank plot all proteins are sorted by intensity value using
get_rank_data()
and plotted against their rank. For every group of the selected level one plot is created and saved bysave_rank_results()
, averaging the protein intensities of the replicates of a group.The highest intensity accounts for rank 0, the lowest intensity for the number of proteins - 1 whereby proteins with missing values are neglected. The median intensity of all proteins is given in the legend.Pathway analysis protein lists can be applied to the rank plot to provide information about the median intensity or rank of pathways of interest. If a protein is part of a selected pathway it is presented in color and the median rank of all proteins of a given pathway is indicated. Multiple pathways can be selected and and are consequently represented in the same graph as distinct groups.To view adjustable parameters see “plot_rank_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_relative_std
(dfs_to_use, levels, **kwargs)¶ - Illustrates the relative standard deviation of the proteins between samples of a group which can help to understand how much fluctuation of the measured intensities is present between the replicates. Low deviation indicates that measured intensities are stable over multiple samples.For each group of the selected level one plot is created.The method applies
get_relative_std_data()
to calculate which proteins of a group can be used for the analysis (see Thresholds and Comparisons) and to filter out proteins below the threshold. Then,save_relative_std_results()
is used to calculate the relative standard deviation and plot and save the data.Lines drawn in different shades of blue indicate arbitrary chosen thresholds of 10%, 20% and 30% of the relative std and the number of proteins with a relative std below these values.To view adjustable parameters see “plot_relative_std_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see galleryNote
To determine which proteins can be compared between the two samples an internal threshold function is applied.
- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_scatter_replicates
(dfs_to_use, levels, **kwargs)¶ - Uses
get_scatter_replicates_data()
to retrieve protein intensity values for each sample of a selected group.For all samples/replicates per group of the selected level, pairwise comparisons of the protein intensities are plotted and their Pearson’s correlation coefficient r^2 is calculated.Unique proteins per replicate are shown at the bottom and right side of the graph (replacement of NA values by min value of data set).The calculated Pearson’s correlation coefficient r^2 is additionally visualized in form of a correlation heatmap.For a group with more than 2 replicates, each pairwise comparison of the replicates is calculated and plotted together in one graph. For every group of the selected level one scatter plot and one correlation heatmap is created and saved usingsave_scatter_replicates_results()
.To view adjustable parameters see “plot_scatter_replicates_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_venn_groups
(dfs_to_use, levels, **kwargs)¶ - Venn diagrams conduce the graphical illustration of set theory. In the
mspypeline
protein counts (proteins with an intensity value > 0) constitute the sets and set relationships indicate the number of proteins that are shared between two or more sets. Thereby the similarity of detected proteins of a set can be assessed.The functionget_venn_group_data()
is used to calculate which proteins can be compared between groups or are unique for a group of the selected level (see Thresholds and Comparisons) and then counts these proteins per group.The method then creates and saves both a venn diagram usingsave_venn()
and a bar-venn diagram usingsave_bar_venn()
comparing the similarity of the groups on the selected level (based on protein counts). The ordinary venn diagram is quite intuitive, but it supports a maximum of three comparisons in themspypeline
. The bar-venn diagram holds the advantage of allowing an unlimited number of comparison sets. These figures consists of two combined graphs, an upper bar diagram, tha indicates the number of unique or shared proteins of a set or overlapping sets. The lower graph indicates which set or sets are being compared, respectively, which protein count (upper graph) belongs to which comparison (lower graph).To view adjustable parameters see “plot_venn_groups_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see galleryNote
A venn diagram can compare a maximum of 3 samples.
A bar-venn diagram can compare more than 3 samples.
If the selected level has more than 3 groups, only the bar-venn diagram is created.
If the selected level has more than 6 groups no diagram is created
Note
To determine which proteins can be compared between the groups and which are unique for one group an internal threshold function is applied.
- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
plot_venn_results
(dfs_to_use, levels, **kwargs)¶ - Venn diagrams conduce the graphical illustration of set theory. In the
mspypeline
protein counts (greater than zero) constitue the sets and set relationships indicate the number of proteins that are shared between two or more sets. Thereby the similarity of detected proteins of a set can be assessed. | The functionget_venn_data_per_key()
is used to count the protein intensity values > 0 (number of detected proteins) for each replicate of a group from the selected level.The method creates and saves both a venn diagram usingsave_venn()
and a bar-venn diagram usingsave_bar_venn()
comparing the similarity of the replicates of each group from the selected level (based on protein counts). The ordinary venn diagram is quite intuitive, but it supports a maximum of three comparisons in themspypeline
. The bar-venn diagram holds the advantage of allowing an unlimited number of comparison sets. These figures consists of two combined graphs, an upper bar diagram, tha indicates the number of unique or shared proteins of a set or overlapping sets. The lower graph indicates which set or sets are being compared, respectively, which protein count (upper graph) belongs to which comparison (lower graph).To view adjustable parameters see “plot_venn_results_settings:” in the Adjustable Options ConfigsFor overview of plots see analysis optionsFor exemplary plot see galleryNote
A venn diagram can compare a maximum of 3 samples.
A bar-venn diagram can compare more than 3 samples.
If a group of the selected level has more than 3 replicates, only the bar-venn diagram is created.
If the selected level has more than 6 groups no diagram is created
- Parameters
dfs_to_use (Union[str, Iterable[str]]) – which dataframes/intensities should be plotted
levels (Union[int, Iterable[int]]) – at which level of the data tree should the data be compared
kwargs – accepts kwargs
- Returns
A list of all created plots.
- Return type
List
See also
-
MaxQuantPlotter¶
-
class
mspypeline.
MaxQuantPlotter
(start_dir, reader_data, intensity_df_name='proteinGroups', interesting_proteins=None, go_analysis_gene_names=None, configs=None, required_reader='mqreader', intensity_entries='raw', 'Intensity ', 'Intensity', 'lfq', 'LFQ intensity ', 'LFQ intensity', 'ibaq', 'iBAQ ', 'iBAQ intensity', loglevel=10)¶ MaxQuant Plotter is a child class of the
BasePlotter
and inherits all functionality to get data and generate plots.-
__init__
(start_dir, reader_data, intensity_df_name='proteinGroups', interesting_proteins=None, go_analysis_gene_names=None, configs=None, required_reader='mqreader', intensity_entries='raw', 'Intensity ', 'Intensity', 'lfq', 'LFQ intensity ', 'LFQ intensity', 'ibaq', 'iBAQ ', 'iBAQ intensity', loglevel=10)¶ - Parameters
start_dir (str) – location to save results
reader_data (dict) – mapping to provide input data
intensity_df_name (str) – name/key to input data
interesting_proteins (dict) – mapping with pathway proteins to analyze
go_analysis_gene_names (dict) – mapping with go terms to analyze
configs (dict) – mapping of configuration
required_reader – name of the file reader
intensity_entries – tuple of (key in all_tree_dict, prefix in data, name in plot). See
add_intensity_column()
.loglevel – level of the logger
-
create_report
(target_dir=None)¶ Creates a MaxQuantReport.pdf, which can be used as quality control.
For overview of plots see analysis optionsFor exemplary plot see gallery- Parameters
target_dir (str) – directory where report will be written
-
classmethod
from_MSPInitializer
(mspinit_instance, **kwargs)¶ - Creates a BasePlotter from a
MSPInitializer
.- Parameters
mspinit_instance (mspypeline.core.MSPInitializer.MSPInitializer) – instance of a
MSPInitializer
used to get correct inputs for the plotter.kwargs – all kwargs, which are passed to the
BasePlotter.__init__()
can be overwritten by passing as kwargs.
- Returns
functional plotter
- Return type
-
classmethod
from_file_reader
(reader_instance, **kwargs)¶ - Creates a BasePlotter from a
BaseReader
(BasePlotter or MaxQuantPlotter).- Parameters
reader_instance (mspypeline.core.MSPPlots.MaxQuantPlotter.MQReader) – instance of a
BaseReader
used to get correct inputs for the plotter.kwargs – all kwargs, which are passed to the
BasePlotter.__init__()
can be overwritten by passing as kwargs.
- Returns
functional plotter
- Return type
-