Coding examples via Python¶
Create a plotter from custom dataset¶
When data is available in some other format or already preprocessed it is possible to create a plotter by providing the data in a DataFrame.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: from mspypeline import BasePlotter
In [4]: from mspypeline.helpers import get_analysis_design
In [5]: samples = [f"Tumor_Stage{s}_Experiment{e}" for s in ["I", "II", "III", "IV"] for e in range(1, 4)] + [
...: f"Control_Stage{s}_Experiment{e}" for s in ["I", "II", "III", "IV"] for e in range(1, 4)]
...:
In [6]: data = pd.DataFrame(np.exp2(np.random.normal(26, 3, (100, 24))).astype(np.int64), columns=samples)
In [7]: data.iloc[:, 12:] = data.iloc[:, 12:] + 1e8 # this is just for later, not required here
In [8]: analysis_design = get_analysis_design(samples)
In [9]: plotter = BasePlotter("result_dir", reader_data={"custom_reader": {"my_data": data}},
...: intensity_df_name="my_data", configs={"analysis_design": analysis_design},
...: required_reader="custom_reader", intensity_entries=[("raw", "", "Intensity")])
...:
In [10]: plotter.all_tree_dict["raw_log2"].aggregate(None, None).head()
Out[10]:
Tumor_StageI_Experiment1 ... Control_StageIV_Experiment3
0 32.014778 ... 26.653581
1 30.577475 ... 26.968246
2 26.091613 ... 27.043309
3 23.466884 ... 27.252680
4 26.575561 ... 26.737686
[5 rows x 24 columns]
In [11]: plotter.all_tree_dict["raw_log2"].groupby(0).head()
Out[11]:
level_0 Tumor Control
0 25.846551 28.046920
1 26.031083 27.452876
2 26.913305 28.170480
3 25.993562 27.654801
4 26.284018 27.304514
Or alternatively this is also possible:
In [12]: plotter = BasePlotter("result_dir", configs={"analysis_design": analysis_design})
In [13]: plotter.add_intensity_column("raw", "", "Intensity", df=data)
In [14]: plotter.all_tree_dict["raw_log2"].aggregate(None, None).head()
Out[14]:
Tumor_StageI_Experiment1 ... Control_StageIV_Experiment3
0 32.014778 ... 26.653581
1 30.577475 ... 26.968246
2 26.091613 ... 27.043309
3 23.466884 ... 27.252680
4 26.575561 ... 26.737686
[5 rows x 24 columns]
In [15]: plotter.all_tree_dict["raw_log2"].groupby(0).head()
Out[15]:
level_0 Tumor Control
0 25.846551 28.046920
1 26.031083 27.452876
2 26.913305 28.170480
3 25.993562 27.654801
4 26.284018 27.304514
Add normalization options¶
Normalizing data is easy by adding a normalized option. The source data needs to be provided, then a Normalizer and lastly a name for the new option.
In [16]: from mspypeline import MedianNormalizer
In [17]: plotter.add_normalized_option("raw", MedianNormalizer, "median_norm")
In [18]: plotter.all_tree_dict.keys()
Out[18]: dict_keys(['raw', 'raw_log2', 'raw_median_norm', 'raw_median_norm_log2'])
This added two new options: ‘raw_median_norm’ and ‘raw_median_norm_log2’.
Global protein level¶
This is an example plot which compares the global protein level and additionally shows mean and standard deviation.
In [19]: import matplotlib.pyplot as plt
In [20]: from matplotlib.collections import LineCollection
In [21]: def create_plot(intensity):
....: df_grouped_normal = plotter.all_tree_dict[intensity]["Control"].groupby()
....: df_grouped_tumor = plotter.all_tree_dict[intensity]["Tumor"].groupby()
....: segs_normal = [[(i, value) for i, value in enumerate(df_grouped_normal.loc[protein])]
....: for protein in df_grouped_normal.index]
....: linecoll_normal = LineCollection(segs_normal, color="gray", linewidth=0.05)
....: segs_tumor = [[(i, value) for i, value in enumerate(df_grouped_tumor.loc[protein])]
....: for protein in df_grouped_tumor.index]
....: linecoll_tumor = LineCollection(segs_tumor, color="lightcoral", linewidth=0.05)
....: fig, ax = plt.subplots(1, 1, figsize=(14, 7))
....: ax.add_collection(linecoll_normal);
....: ax.add_collection(linecoll_tumor);
....: ax.errorbar([i for i, x in enumerate(df_grouped_normal.columns)], df_grouped_normal.mean(),
....: yerr=df_grouped_normal.std(), color="black");
....: ax.errorbar([i for i, x in enumerate(df_grouped_tumor.columns)], df_grouped_tumor.mean(),
....: yerr=df_grouped_tumor.std(), color="red");
....: ax.set_ylim(18, 40);
....: ax.set_xlim(-0.5, 3.5);
....: ax.set_xticks([0, 1, 2, 3]);
....: ax.set_xticklabels(df_grouped_tumor.columns);
....: ax.set_ylabel(intensity);
....: return fig
....:
In [22]: create_plot("raw_log2");
In [23]: create_plot("raw_median_norm_log2");