Gallery¶

In the following, all available visualization options are presented. Additionally, a minimal code example on how to create a MaxQuantPlotter using python is given, that is deployed to subsequently generate all following plots.

For every generated graphic, a short description is provided that may be used to understand the underlaying calculations. A rather detailed, biological analysis is provided in benchmark dataset analysis, where the data structure and analysis design of the experimental samples can be understood as well as possible evaluation approaches for the different plots.

Plotter creation¶

First, a plotter object has to be created to make the plots. Here, the MaxQuantPlotter is build from the MSPInitializer class which creates and reads in the configuration file and initiates the MQReader that loads the exemplary data set provided.

In [1]: import pandas as pd

In [2]: import os

In [3]: from mspypeline import load_example_dataset, MaxQuantPlotter

# load the data that is provided in a submodule
In [4]: init = load_example_dataset(configs={
   ...:     "has_techrep": True,
   ...:     "pathways": ["BIOCARTA_EGF_PATHWAY.txt", "HALLMARK_IL2_STAT5_SIGNALING.txt"],
   ...:     "go_terms": ["GO_APOPTOTIC_SIGNALING_PATHWAY.txt", "GO_INFLAMMATORY_RESPONSE.txt"]
   ...:     })
   ...: 

In [5]: plotter = MaxQuantPlotter.from_MSPInitializer(init)

# create a second plotter without collapsed technical replicates
In [6]: init = load_example_dataset(configs={"has_techrep": False, "pathways":[]})

In [7]: plotter_with_tech_reps = MaxQuantPlotter.from_MSPInitializer(init)

define some helper functions and configurations

In [8]: on_rtd = os.environ.get('READTHEDOCS', False) == 'True'

In [9]: static_dir = "./_static" if on_rtd else "./source/_static"

In [10]: savefig_dir = "./savefig" if on_rtd else "./source/savefig"

In [11]: def select_fig(plts, idx):
   ....:     plt.figure(plts[idx][0].number)
   ....: 

MaxQuant Report¶

Created using: create_report()

The MaxQuant report was built with the intention to offer a broad insight into the different sources of information from a MaxQuant output tables. Besides the protein intensities (from the proteinGroups.txt file) which are the only source of data for all other parts of the analysis with the MaxQuant Plotter, further information about experimental and technical parameters of the experiment are taken into account.

The MaxQuant report can function as a data quality control tool and outputs a multi-page pdf document composed of a variety of information and graphics.

Make sure that all MaxQuant files which are used to create the report are provided.

For overview of plots see analysis options

In [12]: plotter.create_report(static_dir);

# print("skipping report")

The resulting MaxQuant Report.

Normalization Plots¶

The helper function plot_all_normalizer_overview() is used to generate the same plot multiple times with different normalizations methods of the base data.

Normalization overview¶

Created using: plot_normalization_overview_all_normalizers() by calling plot_normalization_overview()

The Normalization overview offers the opportunity to examine different aspects of the data in three distinct plots. For each normalization method provided an additional page is attached to the resulting pdf file starting with the raw or not normalized data. That way it is possible to get a better understanding of the effects of the normalization methods on the data, to inspect the different approaches and to find the best suitable normalization for the data.

The normalization overview combines the plots plot_kde() (see KDE example), plot_n_proteins_vs_quantile() (see proteins vs quantiles example) and plot_boxplot() (see boxplot example).

To view adjustable parameters see “plot_normalization_overview_all_normalizers_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

In [13]: plotter.plot_normalization_overview_all_normalizers("raw_log2", 0, save_path=static_dir);

# print("skipping norm overview")

View this normalization overview example.

Heatmap overview¶

Created using: plot_heatmap_overview_all_normalizers() by calling plot_intensity_heatmap().

Creates the intensity heatmap overview for all normalization methods.
The intensity heatmap demonstrates protein intensities, where samples are given in rows on the y axis and
proteins on the x axis. Missing values are colored in gray.
The heatmap can be used to spot patterns in the different normalization methods and to
understand how different intensity types affect the data.

To view adjustable parameters see “plot_heatmap_overview_all_normalizers_settings:” in the Adjustable Options Configs
For overview of plots see analysis options
For exemplary plot see gallery

Note

If the heatmap seems blurred try downloading it and using a different PDF viewer

In [14]: plotter.plot_heatmap_overview_all_normalizers("raw_log2", 0, vim=19.5, vmax=40, save_path=static_dir);

# print("skipping heatmap overview")

View this heatmap overview example.

Outlier detection and comparison plots¶

Detection counts¶

Created using: plot_detection_counts()

Uses get_detection_counts_data() to count the number of intensity values > 0 per protein (number of samples that the protein is detected in) per group of the selected level.

The data is plotted and saved using save_detection_counts_results() as a bar diagram showing how often proteins are detected in a number of samples/replicates for each group.

To view adjustable parameters see “plot_detection_counts_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

In [15]: plotter.plot_detection_counts("lfq_log2", 0, save_path=None);

Number of detected proteins¶

Created using: plot_detected_proteins_per_replicate()

Uses get_detected_proteins_per_replicate_data() to count the number of protein intensity values greater than 0 (number of detected proteins) per sample of a group from the selected level.

The data is plotted and saved using save_detected_proteins_per_replicate_results() as bar diagram showing the number of detected proteins per sample as well as the total number of detected proteins for each group of a selected level.

The average number of detected proteins per group is indicated as gray dashed line.

To view adjustable parameters see “plot_detected_proteins_per_replicate_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

Depending on whether technical replicates should be averaged (top graph) or not (bottom graph) the data and resulting plot will have different outcomes. The number of detected proteins per sample changes as 0 values are handled as missing values (“nan”) and neglected when calculating the mean of samples.

In [16]: plotter.plot_detected_proteins_per_replicate("lfq_log2", 1, save_path=None);

In [17]: plotter_with_tech_reps.plot_detected_proteins_per_replicate("lfq_log2", 1, save_path=None);

Venn diagrams¶

Created using: plot_venn_results()

Venn diagrams conduce the graphical illustration of set theory. In the mspypeline protein counts (greater than zero) constitue the sets and set relationships indicate the number of proteins that are shared between two or more sets. Thereby the similarity of detected proteins of a set can be assessed. | The function get_venn_data_per_key() is used to count the protein intensity values > 0 (number of detected proteins) for each replicate of a group from the selected level.

The method creates and saves both a venn diagram using save_venn() and a bar-venn diagram using save_bar_venn() comparing the similarity of the replicates of each group from the selected level (based on protein counts). The ordinary venn diagram is quite intuitive, but it supports a maximum of three comparisons in the mspypeline. The bar-venn diagram holds the advantage of allowing an unlimited number of comparison sets. These figures consists of two combined graphs, an upper bar diagram, tha indicates the number of unique or shared proteins of a set or overlapping sets. The lower graph indicates which set or sets are being compared, respectively, which protein count (upper graph) belongs to which comparison (lower graph).

To view adjustable parameters see “plot_venn_results_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

Note

A venn diagram can compare a maximum of 3 samples.

A bar-venn diagram can compare more than 3 samples.

If a group of the selected level has more than 3 replicates, only the bar-venn diagram is created.

If the selected level has more than 6 groups no diagram is created

In [18]: plots = plotter.plot_venn_results("lfq_log2", 1, close_plots=None, save_path=None)

In [19]: select_fig(plots, 0);

In [20]: select_fig(plots, 1);

Group diagrams¶

Created using: plot_venn_groups()

Venn diagrams conduce the graphical illustration of set theory. In the mspypeline protein counts (proteins with an intensity value > 0) constitute the sets and set relationships indicate the number of proteins that are shared between two or more sets. Thereby the similarity of detected proteins of a set can be assessed.

The function get_venn_group_data() is used to calculate which proteins can be compared between groups or are unique for a group of the selected level (see Thresholds and Comparisons) and then counts these proteins per group.

The method then creates and saves both a venn diagram using save_venn() and a bar-venn diagram using save_bar_venn() comparing the similarity of the groups on the selected level (based on protein counts). The ordinary venn diagram is quite intuitive, but it supports a maximum of three comparisons in the mspypeline. The bar-venn diagram holds the advantage of allowing an unlimited number of comparison sets. These figures consists of two combined graphs, an upper bar diagram, tha indicates the number of unique or shared proteins of a set or overlapping sets. The lower graph indicates which set or sets are being compared, respectively, which protein count (upper graph) belongs to which comparison (lower graph).

To view adjustable parameters see “plot_venn_groups_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

Note

A venn diagram can compare a maximum of 3 samples.

A bar-venn diagram can compare more than 3 samples.

If the selected level has more than 3 groups, only the bar-venn diagram is created.

If the selected level has more than 6 groups no diagram is created

Note

To determine which proteins can be compared between the groups and which are unique for one group an internal threshold function is applied.

In [21]: plotter.plot_venn_groups("lfq_log2", 0, close_plots=None, save_path=savefig_dir, fig_format=".png");

_images/venn_replicate_group_level_0_lfq_log2_level_0.png

_images/venn_bar_group_level_0_lfq_log2_level_0.png

Principal Component analysis (PCA) overview¶

Created using: plot_pca_overview()

With the option to perform PCA, data can be studied for its variance and in doing so, parameters can be determined that have most strongly affected the variability between samples. The created PCA compares all components against each other (default = 2 components).

PCA results are calculated using get_pca_data() that gets protein intensities for all samples per group, processes data according to the given arguments, and then performs a dimensionality reduction (PCA) using sklearn.decomposition.PCA. Multiple different analysis options can be chosen to generate a PCA (see: multiple option config).

The results do not change in dependence on the chosen level, however, determining the level on which the data should be compared influences the coloring of the scatter elements. Each group of the selected level is colored differently. The data is plotted and saved using save_pca_results().

To view adjustable parameters see “plot_pca_overview_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

plot_pca_overview_settings:

In [22]: plotter.plot_pca_overview("lfq_log2", 1, save_path=savefig_dir, fig_format=".png");

_images/pca_overview_lfq_log2_level_1.png

Intensity histogram¶

Created using: plot_intensity_histograms()

Uses get_intensity_histograms_data() to get protein intensity values for each sample per group of the selected level.

The intensity values of each sample are binned (default = 25) and the data of each sample from a group of the selected level is plotted and saved in one histogram using save_intensity_histogram_results().

If the parameter “show_mean” is set to True in the configs the mean intensity of the plotted samples of a group is shown as gray dashed line.

To view adjustable parameters see “plot_intensity_histograms_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

In [23]: plotter.plot_intensity_histograms("lfq_log2", 1, save_path=None);

Relative standard deviation (std)¶

Created using: plot_relative_std()

Illustrates the relative standard deviation of the proteins between samples of a group which can help to understand how much fluctuation of the measured intensities is present between the replicates. Low deviation indicates that measured intensities are stable over multiple samples.

For each group of the selected level one plot is created.

The method applies get_relative_std_data() to calculate which proteins of a group can be used for the analysis (see Thresholds and Comparisons) and to filter out proteins below the threshold. Then, save_relative_std_results() is used to calculate the relative standard deviation and plot and save the data.

Lines drawn in different shades of blue indicate arbitrary chosen thresholds of 10%, 20% and 30% of the relative std and the number of proteins with a relative std below these values.

To view adjustable parameters see “plot_relative_std_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

Note

To determine which proteins can be compared between the two samples an internal threshold function is applied.

In [24]: plotter.plot_relative_std("lfq_log2", 0, save_path=None);

Scatter replicates¶

Created using: plot_scatter_replicates()

Uses get_scatter_replicates_data() to retrieve protein intensity values for each sample of a selected group.

For all samples/replicates per group of the selected level, pairwise comparisons of the protein intensities are plotted and their Pearson’s correlation coefficient r^2 is calculated.

Unique proteins per replicate are shown at the bottom and right side of the graph (replacement of NA values by min value of data set).

The calculated Pearson’s correlation coefficient r^2 is additionally visualized in form of a correlation heatmap.

For a group with more than 2 replicates, each pairwise comparison of the replicates is calculated and plotted together in one graph. For every group of the selected level one scatter plot and one correlation heatmap is created and saved using save_scatter_replicates_results().

To view adjustable parameters see “plot_scatter_replicates_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

In [25]: plotter.plot_scatter_replicates("lfq_log2", 1, save_path=savefig_dir, fig_format=".png");

_images/scatter_replicates_H838_unst_lfq_log2_level_1.png

_images/scatter_replicates_H838_unst_correlation_heatmap_lfq_log2_level_1.png

Experiment comparison¶

Created using: plot_experiment_comparison()

To generate the experiment comparison plot, the function get_experiment_comparison_data() is used to retrieve protein intensity values for all samples of a given group and to classify those proteins that can be compared between groups and those that are unique for each group (see Thresholds and Comparisons). Then the the mean intensity of these proteins is calculated.

For all groups of the selected level, pairwise comparisons of the protein intensities are plotted and their Pearson’s correlation coefficient r^2 is calculated.

Unique proteins per group are shown at the bottom and right side of the graph (substitution of missing values by the minimum value of the data set).

The calculated Pearson’s correlation coefficient r^2 is additionally visualized in form of a correlation heatmap.

For every pairwise comparison of the groups from the selected level, one scatter plot is created and the results of all pairwise comparisons together are visualized in one combined correlation heatmap using save_scatter_replicates_results().

To view adjustable parameters see “plot_experiment_comparison_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

Note

To determine which proteins can be compared between the two groups and which are unique for one group an internal threshold function is applied.

In [26]: plotter.plot_experiment_comparison("lfq_log2", 0, save_path=savefig_dir, fig_format=".png");

_images/scatter_comparison_H1975_vs_H838_lfq_log2_level_0.png

_images/scatter_comparison_correlation_heatmap_lfq_log2_level_0.png

Rank¶

Created using: plot_rank()

In the rank plot all proteins are sorted by intensity value using get_rank_data() and plotted against their rank. For every group of the selected level one plot is created and saved by save_rank_results(), averaging the protein intensities of the replicates of a group.

The highest intensity accounts for rank 0, the lowest intensity for the number of proteins - 1 whereby proteins with missing values are neglected. The median intensity of all proteins is given in the legend.

Pathway analysis protein lists can be applied to the rank plot to provide information about the median intensity or rank of pathways of interest. If a protein is part of a selected pathway it is presented in color and the median rank of all proteins of a given pathway is indicated. Multiple pathways can be selected and and are consequently represented in the same graph as distinct groups.

To view adjustable parameters see “plot_rank_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

In [27]: plotter.plot_rank("lfq_log2", 0, save_path=savefig_dir, fig_format=".png");

Statistical inference plots¶

Pathway analysis¶

Created using: plot_pathway_analysis()

In the pathway analysis, for each protein of a desired pathway a subplot is created displaying the intensities of the protein for all groups of the selected level.

First, get_pathway_analysis_data() is used to filter out all proteins of the desired pathways for all samples per group of the selected level. The function then determines which of those proteins can be compared between samples (see Thresholds and Comparisons) and significances of these protein intensities are calculated for each pairwise comparison between groups with an independent t-test. P value thresholds are set to the following: * is p < 0.05, ** is p < 0.005, and *** is p < 0.0005. For every selected pathway, two figures are created and saved using save_pathway_analysis_results(), one displaying the significances and the other not displaying them.

For a group of multiple samples, the protein intensity is plotted for each sample (single scatter dot) which are jointly presented in uniform coloring.

To view adjustable parameters see “plot_pathway_analysis_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

Note

To determine which proteins can be compared between two groups an internal threshold function is applied.

The effect of choosing different levels for the analysis on the results can be appreciated in the pathway analyses
shown below. Both figures show the protein intensities of the Biocarta EGF pathway, however
calculation was performed for different analysis levels. Here, the choice of the analysis
depth determines which samples are considered a “group”.
In the top figure where the analysis was performed on level 0 which consists of two groups (H1975 & H838), all samples
belonging to any one of them are grouped together.
In the bottom figure, where the analysis was performed on then next higher level (level 1), the two groups of level 0
are further subdivided into a total of four different groups to which (only) 3 samples are assigned.
Statistical analysis are always performed between two “groups” of samples and require a minimum of 3 samples to
indicate significances.

In [28]: plots_level0 = plotter.plot_pathway_analysis("lfq_log2", 0, close_plots=None, save_path=None)

In [29]: plots_level1 = plotter.plot_pathway_analysis("lfq_log2", 1, close_plots=None, save_path=None)

In [30]: select_fig(plots_level0, 0);

In [31]: select_fig(plots_level1, 0);

Go analysis¶

Created using: plot_go_analysis()

In the GO analysis, an enrichment analysis is performed for each selected GO Term file (based on protein counts = proteins with intensity value > 0). For this analysis get_go_analysis_data() is used to calculate the number of detected proteins from a GO term that are found in each group of the selected level. The data is illustrated as the length of the corresponding bar. P values shown at the end of a bar indicate the calculated significance. Samples referred to as “Total” represent the complete data set and numbers at the top of the graph accord to the count of detected proteins in all samples over the total number of proteins in the GO term. The data of all chosen pathways is plotted and saved in one graph using save_go_analysis_results()

For p-value calculation, first, for each GO term, a list “pathway_genes” is created by taking the intersection of the proteins from the GO list and the total detected proteins.

Secondly, a list of “non_pathway_genes” is created which comprises total detected proteins but proteins in “pathway_genes”.

Third, a list of “experiment_genes” and “non_experiment_genes” is created in a similar fashion where an experiment references to a sample/group of samples of the data set.

Lastly, a one-tailed fisher exact test is calculated to retrieve statistical significances based on the following contingency table:

in pathway

not in pathway

in experiment

experiment_genes & pathway_genes

experiment_genes & not_pathway_genes

not in experiment

not_experiment_genes & pathway_genes

not_experiment_genes & not_pathway_genes

The resulting p-value is thus, also dependent on the overall protein count of the sample/group of samples. A sample is considered significant if the p value is > 0.05.

To view adjustable parameters see “plot_go_analysis_settings:” in the Adjustable Options Configs

For overview of plots see analysis options

For exemplary plot see gallery

In [32]: plotter.plot_go_analysis("lfq_log2", 1, save_path=None);

Volcano plot (R)¶

Created using: plot_r_volcano()

A volcano plot illustrates the statistical inferences from a pairwise comparison of the two groups.
The plot shows the log2 fold change between two different conditions against the -log10(p-value)
(based on protein intensities). The p-value and adjusted p-value ((Benjamini + Hochberg) are determined using the R
limma package (moderated t-statistic). Additionally,
calculations are corrected for the intensity-variance relationship. For the calculation
of all these parameters get_r_volcano_data() is applied.
Dashed lines indicate the fold change cutoff (default = log2(2) and p-value cutoff (default = p < 0.05) by
which proteins are considered significant (blue and red) or non significant (gray). Measured intensities of
unique proteins are indicated at the sides of the volcano plot for each groups (light blue and orange).
Volcano plots also permit the annotation of mapped proteins. This can be achieved by labeling a number of
the most significant proteins for each group or by selecting a
pathway analysis protein list.
For every pairwise comparison of the groups of the selected level two volcano plots are created and saved,
using :func:`~mspypeline.plotting_backend.matplotlib_plots.save_volcano_results’, where one plot has a set of
proteins annotated and the other does not.

To view adjustable parameters see “plot_r_volcano_settings:” in the Adjustable Options Configs
For overview of plots see analysis options
For exemplary plot see gallery

Note

should be used with log2 intensities
minimum of 3 samples per group required

Note

To determine which proteins can be compared between the two groups and which are unique for one group an internal threshold function is applied.

In the here shown volcano plot, the 10 most significant proteins for each group are annotated. However, if proteins of a specific pathway should be annotated, this can be achieved by selecting one or more pathway lists.

In [33]: print("pass")
pass

# plotter_with_tech_reps.plot_r_volcano("lfq_log2", 0, sample1="H1975", sample2="H838", adj_pval=False, save_path=savefig_dir, fig_format=".png");

_images/volcano_H1975_H838_annotation_adjusted_p_value__lfq_log2.png

Additionally via python¶

Kernel density estimate plot¶

Created using: plot_kde()

In the kernel density estimate (KDE) plot, one density graph per sample is plotted indicating the intensity (derived from get_kde_data()) on the x axis and the density on the y axis. The data is plotted and saved using save_kde_results().

These plots should be presented on a log2 scale.

The KDE is well suited to study the influence of different normalization methods and protein intensities on the data which is why it is part if the Normalization overview.

For overview of plots see analysis options

For exemplary plot see gallery

In the figure shown below, the effect of the two different protein intensity types is presented. The two top graphs show the “raw” and “lfq” intensities, while the two bottom graphs demonstrate the data preprocessed with two different normalizations “tail robust quantile normalization” and “tail robust quantile normalization with missing value handling”.

The KDE can thus help to understand how intensity types or normalization methods may influence the data.

In [34]: plotter.add_normalized_option("raw", plotter.normalizers["trqn"], "trqn")

In [35]: plotter.add_normalized_option("raw", plotter.normalizers["trqn_missing_handled"], "trqn_missing_handled")

In [36]: plotter.plot_kde("raw_log2", 3, save_path=None);

In [37]: plotter.plot_kde("lfq_log2", 3, save_path=None);

In [38]: plotter.plot_kde("raw_trqn_log2", 3, save_path=None);

In [39]: plotter.plot_kde("raw_trqn_missing_handled_log2", 3, save_path=None);

Boxplot¶

Created using: plot_boxplot()

A standard boxplot displaying the five quantile distribution per group of the selected level and ranking the groups by median intensity from the bottom of the graph to the top.

The plot is created by applying get_pca_data() to get protein intensities for all samples per group of the selected level and the sort samples by their median intensity. Data is plotted and saved using save_boxplot_results()

The boxplot is part of the Normalization overview.

For overview of plots see analysis options

For exemplary plot see gallery

In [40]: plotter.plot_boxplot("lfq_log2", 3, save_path=None);

Number of Proteins vs Quantiles¶

Created using: plot_n_proteins_vs_quantile()

Plots the quantile protein intensities against the number of identified proteins per sample. get_n_protein_vs_quantile_data() is used to get protein intensities for all samples per group and subsequently count the number of intensity values > 0 (total number of detected proteins) and the quantiles per sample. The data is visualized and saved by save_n_proteins_vs_quantile_results().

Samples are indicated as a horizontal line of scatter dots where the color anf x position of a dot indicate the intensity value of the respective quantile. The y position of the dots of a sample point to the total number of detected proteins in that sample.

Solid, rather vertical lines indicate a linear fit of each quantile for all the samples.

This plot is part of the Normalization overview.

For overview of plots see analysis options

For exemplary plot see gallery

In [41]: plotter.plot_n_proteins_vs_quantile("lfq_log2", 3, save_path=savefig_dir, fig_format=".png");

_images/n_proteins_vs_quantile_lfq_log2_level_3.png

Intensity Heatmap¶

Created using: plot_intensity_heatmap()

The intensity heatmap demonstrates protein intensities (derived from get_intensity_heatmap_data()), where samples are given in rows on the y axis and proteins on the x axis. Missing values are colored in gray. The data is plotted and saved using save_intensities_heatmap_result().

The heatmap can be used to spot patterns in the different normalization methods and to understand how different intensity types affect the data.

The Heatmap overview is created from a series of intensity heatmap plots.

For overview of plots see analysis options

For exemplary plot see gallery

In the heatmap shown below, samples are sorted by the number of missing values and proteins are ranked by the number of missing values across all samples. So, depending on the defined preferences, the heatmap can, for instance, be used to gather information about the distribution of missing values or the influence of the normalization method by the appearance of patterns.

In [42]: plotter.plot_intensity_heatmap("lfq_log2", 2, sort_index_by_missing=True, sort_columns_by_missing=True, save_path=None);

In [43]: plotter.plot_intensity_heatmap("raw_log2", 2, sort_index_by_missing=True, sort_columns_by_missing=True, save_path=None);