Helpers

import with:

from mspypeline.helpers import get_logger
from mspypeline.helpers import get_number_rows_cols_for_fig, plot_annotate_line # etc.

Logger

mspypeline.helpers.get_logger(name=None, loglevel=10)
Parameters
  • name (str) – name of the logger, if none the filename will be used instead

  • loglevel (Union[int, str]) – loglevel of the logger, either an int or the str level names of the logging module

Returns

A logger with respective level and formatted output

Return type

logging.Logger

Utils

class mspypeline.helpers.Utils.DataDict(data_source, *args, **kwargs)

Overwrites the standard dictionary to provide an additional DataSource. When a missing key is looked up the DataSource is searched for a method named: e.g. looking up key=parameters, looking for method named “preprocess_parameters”, which is expected to return data, which will then be stored under the key. This allows data from disk to be loaded on demand instead of loading all possible data at the beginning.

__init__(data_source, *args, **kwargs)
Parameters
  • data_source – class which will be searched for methods

  • args – passed to dict.__init__

  • kwargs – passed to dict.__init__

mspypeline.helpers.Utils.get_intersection_and_unique(v1, v2, na_function=<function get_number_of_non_na_values>)

Given two dataframes with identical index, determines which rows (proteins) are present in both dataframes, or unique to either dataframe. The number of missing values are counted row-wise and all rows above a threshold are marked as positive. The threshold is determined by the na_function, which determines the allowed number of missing values based on the number of columns. Then all rows which are positive in both dataframes are marked as intersection. Rows that are positive in one dataframe but are missing completely in the other are marked as unique. :param v1: first dataframe :param v2: second dataframe :param na_function: function which takes and returns an int

Returns

Three masks

Return type

The intersection, unique in v1 and unique in v2

Parameters
  • v1 (pandas.core.frame.DataFrame) –

  • v2 (pandas.core.frame.DataFrame) –

  • na_function (Callable) –

mspypeline.helpers.Utils.get_legend_elements(labels, color_map=None, marker_size=None)

Returns custom legend elements based on a list of labels and an optional color map. These elements can be passed to a legend via the ‘handles’ parameter

Parameters
  • labels (list) – list of strings

  • color_map (Optional[dict]) – dict of strings, with keys being the name of a label and values the corresponding color

  • marker_size (Optional[int]) –

mspypeline.helpers.Utils.get_plot_name_suffix(df_to_use=None, level=None)

Generate a suffix for the plot name

Parameters
  • df_to_use (Optional[str]) – dataframe that was used

  • level (Optional[int]) – level on which data was aggregated

Returns

Return type

a string which can be used as a suffix for file paths

mspypeline.helpers.Utils.plot_annotate_line(ax, row1, row2, x, data, fs=None, maxasterix=3)

adjusted function from: https://stackoverflow.com/questions/11517986/indicating-the-statistically-significant-difference-in-bar-graph Annotate plot with p-values with line indicators.

Parameters
  • ax – axis of plot to put the annotaion line

  • num1 – number of left bar to put bracket over

  • num2 – number of right bar to put bracket over

  • data – string to write or number for generating asterixes

  • fs (int) – font size

  • maxasterix (int) – maximum number of asterixes to write (for very small p-values)