Helpers¶

import with:

from mspypeline.helpers import get_logger
from mspypeline.helpers import get_number_rows_cols_for_fig, plot_annotate_line # etc.

Logger¶

mspypeline.helpers.get_logger(name=None, loglevel=10)¶

Parameters

name (str) – name of the logger, if none the filename will be used instead
loglevel (Union[int, str]) – loglevel of the logger, either an int or the str level names of the logging module

Returns

A logger with respective level and formatted output

Return type

logging.Logger

Utils¶

class mspypeline.helpers.Utils.DataDict(data_source, *args, **kwargs)¶

Overwrites the standard dictionary to provide an additional DataSource. When a missing key is looked up the DataSource is searched for a method named: e.g. looking up key=parameters, looking for method named “preprocess_parameters”, which is expected to return data, which will then be stored under the key. This allows data from disk to be loaded on demand instead of loading all possible data at the beginning.

__init__(data_source, *args, **kwargs)¶

Parameters

data_source – class which will be searched for methods
args – passed to dict.__init__
kwargs – passed to dict.__init__

mspypeline.helpers.Utils.get_intersection_and_unique(v1, v2, na_function=<function get_number_of_non_na_values>)¶

Given two dataframes with identical index, determines which rows (proteins) are present in both dataframes, or unique to either dataframe. The number of missing values are counted row-wise and all rows above a threshold are marked as positive. The threshold is determined by the na_function, which determines the allowed number of missing values based on the number of columns. Then all rows which are positive in both dataframes are marked as intersection. Rows that are positive in one dataframe but are missing completely in the other are marked as unique. :param v1: first dataframe :param v2: second dataframe :param na_function: function which takes and returns an int

Returns

Three masks

Return type

The intersection, unique in v1 and unique in v2

Parameters

v1 (pandas.core.frame.DataFrame) –
v2 (pandas.core.frame.DataFrame) –
na_function (Callable) –

mspypeline.helpers.Utils.get_legend_elements(labels, color_map=None, marker_size=None)¶

Returns custom legend elements based on a list of labels and an optional color map. These elements can be passed to a legend via the ‘handles’ parameter

Parameters

labels (list) – list of strings
color_map (Optional[dict]) – dict of strings, with keys being the name of a label and values the corresponding color
marker_size (Optional[int]) –

mspypeline.helpers.Utils.get_plot_name_suffix(df_to_use=None, level=None)¶

Generate a suffix for the plot name

Parameters

df_to_use (Optional[str]) – dataframe that was used
level (Optional[int]) – level on which data was aggregated

Returns

Return type

a string which can be used as a suffix for file paths

mspypeline.helpers.Utils.plot_annotate_line(ax, row1, row2, x, data, fs=None, maxasterix=3)¶

adjusted function from: https://stackoverflow.com/questions/11517986/indicating-the-statistically-significant-difference-in-bar-graph Annotate plot with p-values with line indicators.

Parameters

ax – axis of plot to put the annotaion line
num1 – number of left bar to put bracket over
num2 – number of right bar to put bracket over
data – string to write or number for generating asterixes
fs (int) – font size
maxasterix (int) – maximum number of asterixes to write (for very small p-values)