Coding examples via Python¶
Create a plotter from custom dataset¶
When data is available in some other format or already preprocessed it is possible to create a plotter by providing the data in a DataFrame.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: from mspypeline import BasePlotter
In [4]: from mspypeline.helpers import get_analysis_design
In [5]: samples = [f"Tumor_Stage{s}_Experiment{e}" for s in ["I", "II", "III", "IV"] for e in range(1, 4)] + [
...: f"Control_Stage{s}_Experiment{e}" for s in ["I", "II", "III", "IV"] for e in range(1, 4)]
...:
In [6]: data = pd.DataFrame(np.exp2(np.random.normal(26, 3, (100, 24))).astype(np.int64), columns=samples)
In [7]: data.iloc[:, 12:] = data.iloc[:, 12:] + 1e8 # this is just for later, not required here
In [8]: analysis_design = get_analysis_design(samples)
In [9]: plotter = BasePlotter("result_dir", reader_data={"custom_reader": {"my_data": data}},
...: intensity_df_name="my_data", configs={"analysis_design": analysis_design},
...: required_reader="custom_reader", intensity_entries=[("raw", "", "Intensity")])
...:
In [10]: plotter.all_tree_dict["raw_log2"].aggregate(None, None).head()
Out[10]:
Tumor_StageI_Experiment1 ... Control_StageIV_Experiment3
0 34.106160 ... 26.733788
1 23.666637 ... 26.668590
2 26.124302 ... 27.736005
3 28.217995 ... 26.615706
4 27.921138 ... 29.619297
[5 rows x 24 columns]
In [11]: plotter.all_tree_dict["raw_log2"].groupby(0).head()
Out[11]:
level_0 Tumor Control
0 27.546760 27.793017
1 26.796748 27.494703
2 24.910609 27.789710
3 25.380884 27.465529
4 25.769436 28.208850
Or alternatively this is also possible:
In [12]: plotter = BasePlotter("result_dir", configs={"analysis_design": analysis_design})
In [13]: plotter.add_intensity_column("raw", "", "Intensity", df=data)
In [14]: plotter.all_tree_dict["raw_log2"].aggregate(None, None).head()
Out[14]:
Tumor_StageI_Experiment1 ... Control_StageIV_Experiment3
0 34.106160 ... 26.733788
1 23.666637 ... 26.668590
2 26.124302 ... 27.736005
3 28.217995 ... 26.615706
4 27.921138 ... 29.619297
[5 rows x 24 columns]
In [15]: plotter.all_tree_dict["raw_log2"].groupby(0).head()
Out[15]:
level_0 Tumor Control
0 27.546760 27.793017
1 26.796748 27.494703
2 24.910609 27.789710
3 25.380884 27.465529
4 25.769436 28.208850
Add normalization options¶
Normalizing data is easy by adding a normalized option. The source data needs to be provided, then a Normalizer and lastly a name for the new option.
In [16]: from mspypeline import MedianNormalizer
In [17]: plotter.add_normalized_option("raw", MedianNormalizer, "median_norm")
In [18]: plotter.all_tree_dict.keys()
Out[18]: dict_keys(['raw', 'raw_log2', 'raw_median_norm', 'raw_median_norm_log2'])
This added two new options: ‘raw_median_norm’ and ‘raw_median_norm_log2’.
Global protein level¶
This is an example plot which compares the global protein level and additionally shows mean and standard deviation.
In [19]: import matplotlib.pyplot as plt
In [20]: from matplotlib.collections import LineCollection
In [21]: def create_plot(intensity):
....: df_grouped_normal = plotter.all_tree_dict[intensity]["Control"].groupby()
....: df_grouped_tumor = plotter.all_tree_dict[intensity]["Tumor"].groupby()
....: segs_normal = [[(i, value) for i, value in enumerate(df_grouped_normal.loc[protein])]
....: for protein in df_grouped_normal.index]
....: linecoll_normal = LineCollection(segs_normal, color="gray", linewidth=0.05)
....: segs_tumor = [[(i, value) for i, value in enumerate(df_grouped_tumor.loc[protein])]
....: for protein in df_grouped_tumor.index]
....: linecoll_tumor = LineCollection(segs_tumor, color="lightcoral", linewidth=0.05)
....: fig, ax = plt.subplots(1, 1, figsize=(14, 7))
....: ax.add_collection(linecoll_normal);
....: ax.add_collection(linecoll_tumor);
....: ax.errorbar([i for i, x in enumerate(df_grouped_normal.columns)], df_grouped_normal.mean(),
....: yerr=df_grouped_normal.std(), color="black");
....: ax.errorbar([i for i, x in enumerate(df_grouped_tumor.columns)], df_grouped_tumor.mean(),
....: yerr=df_grouped_tumor.std(), color="red");
....: ax.set_ylim(18, 40);
....: ax.set_xlim(-0.5, 3.5);
....: ax.set_xticks([0, 1, 2, 3]);
....: ax.set_xticklabels(df_grouped_tumor.columns);
....: ax.set_ylabel(intensity);
....: return fig
....:
In [22]: create_plot("raw_log2");
In [23]: create_plot("raw_median_norm_log2");

