Coding examples via Python

Create a plotter from custom dataset

When data is available in some other format or already preprocessed it is possible to create a plotter by providing the data in a DataFrame.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: from mspypeline import BasePlotter

In [4]: from mspypeline.helpers import get_analysis_design

In [5]: samples = [f"Tumor_Stage{s}_Experiment{e}" for s in ["I", "II", "III", "IV"] for e in range(1, 4)] + [
   ...:     f"Control_Stage{s}_Experiment{e}" for s in ["I", "II", "III", "IV"] for e in range(1, 4)]
   ...: 

In [6]: data = pd.DataFrame(np.exp2(np.random.normal(26, 3, (100, 24))).astype(np.int64), columns=samples)

In [7]: data.iloc[:, 12:] = data.iloc[:, 12:] + 1e8  # this is just for later, not required here

In [8]: analysis_design = get_analysis_design(samples)

In [9]: plotter = BasePlotter("result_dir", reader_data={"custom_reader": {"my_data": data}},
   ...:     intensity_df_name="my_data", configs={"analysis_design": analysis_design},
   ...:     required_reader="custom_reader", intensity_entries=[("raw", "", "Intensity")])
   ...: 

In [10]: plotter.all_tree_dict["raw_log2"].aggregate(None, None).head()
Out[10]: 
   Tumor_StageI_Experiment1  ...  Control_StageIV_Experiment3
0                 34.106160  ...                    26.733788
1                 23.666637  ...                    26.668590
2                 26.124302  ...                    27.736005
3                 28.217995  ...                    26.615706
4                 27.921138  ...                    29.619297

[5 rows x 24 columns]

In [11]: plotter.all_tree_dict["raw_log2"].groupby(0).head()
Out[11]: 
level_0      Tumor    Control
0        27.546760  27.793017
1        26.796748  27.494703
2        24.910609  27.789710
3        25.380884  27.465529
4        25.769436  28.208850

Or alternatively this is also possible:

In [12]: plotter = BasePlotter("result_dir", configs={"analysis_design": analysis_design})

In [13]: plotter.add_intensity_column("raw", "", "Intensity", df=data)

In [14]: plotter.all_tree_dict["raw_log2"].aggregate(None, None).head()
Out[14]: 
   Tumor_StageI_Experiment1  ...  Control_StageIV_Experiment3
0                 34.106160  ...                    26.733788
1                 23.666637  ...                    26.668590
2                 26.124302  ...                    27.736005
3                 28.217995  ...                    26.615706
4                 27.921138  ...                    29.619297

[5 rows x 24 columns]

In [15]: plotter.all_tree_dict["raw_log2"].groupby(0).head()
Out[15]: 
level_0      Tumor    Control
0        27.546760  27.793017
1        26.796748  27.494703
2        24.910609  27.789710
3        25.380884  27.465529
4        25.769436  28.208850

Add normalization options

Normalizing data is easy by adding a normalized option. The source data needs to be provided, then a Normalizer and lastly a name for the new option.

In [16]: from mspypeline import MedianNormalizer

In [17]: plotter.add_normalized_option("raw", MedianNormalizer, "median_norm")

In [18]: plotter.all_tree_dict.keys()
Out[18]: dict_keys(['raw', 'raw_log2', 'raw_median_norm', 'raw_median_norm_log2'])

This added two new options: ‘raw_median_norm’ and ‘raw_median_norm_log2’.

Global protein level

This is an example plot which compares the global protein level and additionally shows mean and standard deviation.

In [19]: import matplotlib.pyplot as plt

In [20]: from matplotlib.collections import LineCollection

In [21]: def create_plot(intensity):
   ....:     df_grouped_normal = plotter.all_tree_dict[intensity]["Control"].groupby()
   ....:     df_grouped_tumor = plotter.all_tree_dict[intensity]["Tumor"].groupby()
   ....:     segs_normal = [[(i, value) for i, value in enumerate(df_grouped_normal.loc[protein])]
   ....:         for protein in df_grouped_normal.index]
   ....:     linecoll_normal = LineCollection(segs_normal, color="gray", linewidth=0.05)
   ....:     segs_tumor = [[(i, value) for i, value in enumerate(df_grouped_tumor.loc[protein])]
   ....:         for protein in df_grouped_tumor.index]
   ....:     linecoll_tumor = LineCollection(segs_tumor, color="lightcoral", linewidth=0.05)
   ....:     fig, ax = plt.subplots(1, 1, figsize=(14, 7))
   ....:     ax.add_collection(linecoll_normal);
   ....:     ax.add_collection(linecoll_tumor);
   ....:     ax.errorbar([i for i, x in enumerate(df_grouped_normal.columns)], df_grouped_normal.mean(),
   ....:         yerr=df_grouped_normal.std(), color="black");
   ....:     ax.errorbar([i for i, x in enumerate(df_grouped_tumor.columns)], df_grouped_tumor.mean(),
   ....:         yerr=df_grouped_tumor.std(), color="red");
   ....:     ax.set_ylim(18, 40);
   ....:     ax.set_xlim(-0.5, 3.5);
   ....:     ax.set_xticks([0, 1, 2, 3]);
   ....:     ax.set_xticklabels(df_grouped_tumor.columns);
   ....:     ax.set_ylabel(intensity);
   ....:     return fig
   ....: 

In [22]: create_plot("raw_log2");

In [23]: create_plot("raw_median_norm_log2");
_images/plot_global_protein_level_raw.png _images/plot_global_protein_level_median_norm.png