Modules¶
import with:
from mspypeline import MedianNormalizer, QuantileNormalizer, TailRobustNormalizer, interpolate_data
from mspypeline.modules.Normalization import BaseNormalizer
from mspypeline import DataNode, DataTree
Normalization¶
-
class
mspypeline.modules.Normalization.
BaseNormalizer
(input_scale='log2', output_scale='normal', col_name_prefix=None, loglevel=10, **kwargs)¶ Abstract base class for Normalizers. Derived normalizers should implement the
fit()
andtransform()
.-
__init__
(input_scale='log2', output_scale='normal', col_name_prefix=None, loglevel=10, **kwargs)¶ - Parameters
input_scale (str) – Scale of the input data. Either normal or log2
output_scale (str) – Scale of the output data. Either normal or log2
col_name_prefix (Optional[str]) – If not None the prefix is added to each column name
loglevel (int) – loglevel of the logger
kwargs – accepts kwargs
-
abstract
fit
(data)¶ Abstract fit method. Should return self.
- Parameters
data (pandas.core.frame.DataFrame) – Should be a DataFrame or ndarray.
- Returns
The normalizer instance.
- Return type
self
-
fit_transform
(data)¶ Chains the fit and transform method.
- Parameters
data (pandas.core.frame.DataFrame) – Should be a DataFrame or ndarray.
- Returns
transformed data
- Return type
DataFrame
-
abstract
transform
(data)¶ Abstract transform method. Should return transformed data.
- Parameters
data (pandas.core.frame.DataFrame) – Should be a DataFrame or ndarray.
- Returns
transformed data
- Return type
DataFrame
-
-
class
mspypeline.
MedianNormalizer
(input_scale='log2', output_scale='normal', col_name_prefix=None, loglevel=10, **kwargs)¶ Median normalizer, which calculates the median protein intensity for each sample (column). The mean of all sample-wise medians is calculated and subtracted from each sample median. This correction factor is then subtracted from each protein intensity.
-
__init__
(input_scale='log2', output_scale='normal', col_name_prefix=None, loglevel=10, **kwargs)¶ - Parameters
input_scale (str) – Scale of the input data. Either normal or log2
output_scale (str) – Scale of the output data. Either normal or log2
col_name_prefix (Optional[str]) – If not None the prefix is added to each column name
loglevel (int) – loglevel of the logger
kwargs – accepts kwargs
-
fit
(data)¶ Abstract fit method. Should return self.
- Parameters
data (pandas.core.frame.DataFrame) – Should be a DataFrame or ndarray.
- Returns
The normalizer instance.
- Return type
self
-
transform
(data)¶ Abstract transform method. Should return transformed data.
- Parameters
data (pandas.core.frame.DataFrame) – Should be a DataFrame or ndarray.
- Returns
transformed data
- Return type
DataFrame
-
-
class
mspypeline.
QuantileNormalizer
(missing_value_handler=<function interpolate_data>, input_scale='log2', output_scale='normal', col_name_prefix=None, loglevel=10, **kwargs)¶ Quantile Normalizer, which first ranks proteins after their intensity value for each sample (column). The mean protein intensity per quantile across all samples is calculated and assigned to every protein of each sample. The data is rearranged to the original order of the intensity values for each sample. For more in depth description see: https://en.wikipedia.org/wiki/Quantile_normalization
-
__init__
(missing_value_handler=<function interpolate_data>, input_scale='log2', output_scale='normal', col_name_prefix=None, loglevel=10, **kwargs)¶ - Parameters
missing_value_handler (Optional[Callable]) – function to fill missing values
input_scale (str) – Scale of the input data. Either normal or log2
output_scale (str) – Scale of the output data. Either normal or log2
col_name_prefix (Optional[str]) – If not None the prefix is added to each column name
loglevel (int) – loglevel of the logger
kwargs – accepts kwargs
-
fit
(data)¶ Abstract fit method. Should return self.
- Parameters
data (pandas.core.frame.DataFrame) – Should be a DataFrame or ndarray.
- Returns
The normalizer instance.
- Return type
self
-
transform
(data)¶ Abstract transform method. Should return transformed data.
- Parameters
data (pandas.core.frame.DataFrame) – Should be a DataFrame or ndarray.
- Returns
transformed data
- Return type
DataFrame
-
-
class
mspypeline.
TailRobustNormalizer
(normalizer=<class 'mspypeline.modules.Normalization.QuantileNormalizer'>, missing_value_handler=<function interpolate_data>, input_scale='log2', output_scale='normal', col_name_prefix=None, loglevel=10, **kwargs)¶ Tail Robust Normalizer, which first calculates an offsetting factor by taking the sample-wise mean and is subtracted from each protein of the respective sample (column). A Normalization is applied, and the respective offset value is added back to each protein of the sample. The performed calculation is an abstracted implementation of the Tail Robust Quantile Normalization as described here: https://www.biorxiv.org/content/10.1101/2020.04.17.046227v1.full .
-
__init__
(normalizer=<class 'mspypeline.modules.Normalization.QuantileNormalizer'>, missing_value_handler=<function interpolate_data>, input_scale='log2', output_scale='normal', col_name_prefix=None, loglevel=10, **kwargs)¶ - Parameters
normalizer (Type[mspypeline.modules.Normalization.BaseNormalizer]) – a normalizer that should be used in combination with this normalizer
missing_value_handler (Optional[Callable]) – function to fill missing values
input_scale (str) – Scale of the input data. Either normal or log2
output_scale (str) – Scale of the output data. Either normal or log2
col_name_prefix (Optional[str]) – If not None the prefix is added to each column name
loglevel (int) – loglevel of the logger
kwargs – accepts kwargs
-
fit
(data)¶ Abstract fit method. Should return self.
- Parameters
data (pandas.core.frame.DataFrame) – Should be a DataFrame or ndarray.
- Returns
The normalizer instance.
- Return type
self
-
transform
(data)¶ Abstract transform method. Should return transformed data.
- Parameters
data (pandas.core.frame.DataFrame) – Should be a DataFrame or ndarray.
- Returns
transformed data
- Return type
DataFrame
-
-
mspypeline.
interpolate_data
(data)¶ Performs interpolation of missing values (protein int = 0) on the data by sampling from the same distribution as the input distribution. Adopted from https://github.com/bmbolstad/preprocessCore, more specifically: https://github.com/bmbolstad/preprocessCore/blob/master/src/qnorm.c
- Parameters
data (pandas.core.frame.DataFrame) – A DataFrame with columns being the samples and rows and being the features
- Returns
filled data where all values have been replaced by interpolating from the old data column wise. For the non missing entries the new values are very close to the old, while the for the missing entries a sampled value is assigned
- Return type
DataFrame
DataStructure¶
-
class
mspypeline.
DataNode
(name='', level=0, parent=None, data=None, children=None)¶ -
__init__
(name='', level=0, parent=None, data=None, children=None)¶ Default parameters will return a root node
- Parameters
name (str) – Name of the node
level (int) – depth of the node
parent (Optional[mspypeline.modules.DataStructure.DataNode]) – Parent of this node
data (pandas.core.series.Series) – Is None when there are nodes below this one, which were not aggregated as technical replicated
children (Dict[str, DataNode]) – Maps name of a child to a child node
See also
DataTree()
A class to help construct a node structure from data
-
aggregate
(method='mean', go_max_depth=False, index=None)¶ - Parameters
method (Union[None, str, Callable]) – If None no aggregation will be applied. Otherwise needs to be accepted by pd.aggregate.
go_max_depth (bool) – If technical replicates were aggregated, this can be specified to use the unaggregated values instead.
index (Union[str, pandas.core.indexes.base.Index, None]) – Index to subset the data with. If None no index is applied
- Returns
Result of the aggregation
- Return type
Union[pd.Series, pd.DataFrame]
-
get_total_number_children
(go_max_depth=False)¶ Gets the number of children containing data below this node. If go_max_depth will search for the deepest DataNodes.
- Parameters
go_max_depth (bool) – default false
- Returns
The number of all children below this node
- Return type
int
-
groupby
(method='mean', go_max_depth=False, index=None)¶ consider each child a group then aggregate all children
- Parameters
method (Union[str, Callable]) – Will be passed to aggregate.
go_max_depth (bool) – Will be passed to aggregate.
index (Union[None, str, pandas.core.indexes.base.Index]) – Will be passed to aggregate.
- Returns
Result of the grouping
- Return type
data
See also
aggregate()
Will be called on each of the groups
-
-
class
mspypeline.
DataTree
(root)¶ Data Structure in which the experiment is stored. Each leaf node is a
DataNode
-
level_keys_full_name
¶ Has all DataNode.full_name of a depth level of all levels
- Type
Dict[int, List[str]]
-
__init__
(root)¶ - Parameters
root (mspypeline.modules.DataStructure.DataNode) – The root node of the Tree.
-
add_data
(data)¶ - Parameters
data (pandas.core.frame.DataFrame) – Data which will be used to fill the nodes with a Series. The column names of the data need to be the same as the full names of the DataNode.
-
aggregate
(key=None, method='mean', go_max_depth=False, index=None)¶ - Parameters
key (Optional[str]) –
method (Union[None, str, Callable]) –
go_max_depth (bool) –
index (Optional) –
- Return type
Union[pandas.core.series.Series, pandas.core.frame.DataFrame]
-
aggregate_technical_replicates
()¶ Aggregates the deepest level to one level above by using aggregate
-
classmethod
from_analysis_design
(analysis_design, data=None, should_aggregate_technical_replicates=True)¶ - Parameters
analysis_design (dict) – nested dict
data (Union[None, pandas.core.frame.DataFrame]) – Will be passed to add_data. If None no data is added
should_aggregate_technical_replicates (bool) – If True the lowest level of the analysis design is considered as a technical replicate and averaged
- Returns
- Return type
cls
See also
add_data()
will be called if data is not None
aggregate_technical_replicates()
will be called if should_aggregate_technical_replicates
-
groupby
(key_or_index=None, new_col_name=None, method='mean', go_max_depth=False, index=None)¶ - Parameters
key_or_index (Union[None, str, int]) –
new_col_name (str) –
method (Union[None, str, Callable]) –
go_max_depth (bool) –
index –
- Return type
Union[pandas.core.series.Series, pandas.core.frame.DataFrame]
-