Skip to content

Drift Explain

The Drift Explain module (introduced in v0.3.0) provides detailed insights into why drift was detected and how distributions have shifted.

Overview

When drift is detected, you often want to understand:

  • How much did the distribution shift?
  • Which statistics changed the most?
  • What does it look like visually?

DriftWatch provides two main classes for this:

Class Purpose
DriftExplainer Detailed statistical analysis
DriftVisualizer Histogram overlays and plots

Installation

For visualization support, install with the viz extra:

pip install driftwatch[viz]

DriftExplainer

The DriftExplainer provides detailed statistics for understanding drift.

Basic Usage

from driftwatch import Monitor
from driftwatch.explain import DriftExplainer

# Setup
monitor = Monitor(reference_data=train_df)
report = monitor.check(production_df)

# Explain drift
explainer = DriftExplainer(train_df, production_df, report)
explanation = explainer.explain()

# Display summary
print(explanation.summary())

Statistics Provided

For each numeric feature, you get:

Statistic Description
mean_shift Absolute change in mean
mean_shift_percent Relative change (%)
std_change Absolute change in standard deviation
std_change_percent Relative change (%)
ref_min, prod_min Minimum values
ref_max, prod_max Maximum values
quantile_stats Q25, Q50, Q75 comparisons

Example Output

━━━ age ━━━
Status: 🔴 DRIFT DETECTED
Score (psi): 2.9555

📊 Central Tendency:
  Mean: 30.0234 → 40.1567 (+33.75%)

📈 Spread:
  Std: 5.0123 → 4.9876 (-0.49%)

📏 Range:
  Min: 15.2341 → 25.1234
  Max: 44.8765 → 55.3421

📐 Quantiles:
  Q25: 26.5432 → 36.4321 (+37.25%)
  Q50: 29.8765 → 40.0123 (+33.93%)
  Q75: 33.2109 → 43.7654 (+31.79%)

Explain Single Feature

# Get explanation for specific feature
age_exp = explainer.explain_feature("age")

print(f"Mean shifted by {age_exp.mean_shift_percent:.1f}%")
print(f"Std changed by {age_exp.std_change_percent:.1f}%")

Custom Quantiles

# Analyze different quantiles
explainer = DriftExplainer(
    train_df, 
    production_df, 
    report,
    quantiles=[0.1, 0.5, 0.9]  # 10th, 50th, 90th percentiles
)

Export to JSON

import json

# Export for logging/analysis
data = explanation.to_dict()
print(json.dumps(data, indent=2, default=str))

DriftVisualizer

The DriftVisualizer creates histogram overlays to visualize distribution shifts.

Basic Usage

from driftwatch.explain import DriftVisualizer
import matplotlib.pyplot as plt

viz = DriftVisualizer(train_df, production_df, report)

# Plot single feature
fig = viz.plot_feature("age")
plt.show()

Plot All Features

# Grid of all numeric features
fig = viz.plot_all(cols=2)
plt.show()

Customization

# Customize appearance
fig = viz.plot_feature(
    "age",
    bins=30,              # Number of histogram bins
    figsize=(12, 8),      # Figure size
    show_stats=True,      # Show stats box
    alpha=0.7             # Transparency
)

Save to File

# Save single feature
viz.save("age_drift.png", feature_name="age", dpi=150)

# Save all features
viz.save("drift_report.png", dpi=150)
viz.save("drift_report.pdf")  # Vector format

Complete Example

import pandas as pd
import numpy as np
from driftwatch import Monitor
from driftwatch.explain import DriftExplainer, DriftVisualizer

# Generate data
np.random.seed(42)
train_df = pd.DataFrame({
    'age': np.random.normal(30, 5, 1000),
    'income': np.random.normal(50000, 10000, 1000),
})

prod_df = pd.DataFrame({
    'age': np.random.normal(40, 5, 1000),  # Drift!
    'income': np.random.normal(50000, 10000, 1000),
})

# Detect drift
monitor = Monitor(reference_data=train_df)
report = monitor.check(prod_df)

# Explain
explainer = DriftExplainer(train_df, prod_df, report)
explanation = explainer.explain()
print(explanation.summary())

# Visualize
viz = DriftVisualizer(train_df, prod_df, report)
viz.save("drift_analysis.png")

API Reference

driftwatch.explain.DriftExplainer

DriftExplainer(reference_data: DataFrame, production_data: DataFrame, report: DriftReport, quantiles: list[float] | None = None)

Explains drift detection results with detailed statistics.

The explainer takes reference and production DataFrames along with a DriftReport and provides detailed insights into why drift was detected and how distributions have shifted.

Example

from driftwatch import Monitor from driftwatch.explain import DriftExplainer

monitor = Monitor(reference_data=train_df) report = monitor.check(prod_df)

explainer = DriftExplainer(train_df, prod_df, report) explanation = explainer.explain() print(explanation.summary())

Initialize the DriftExplainer.

Parameters:

Name Type Description Default
reference_data DataFrame

Reference DataFrame (training data)

required
production_data DataFrame

Production DataFrame to explain

required
report DriftReport

DriftReport from Monitor.check()

required
quantiles list[float] | None

List of quantiles to analyze (default: [0.25, 0.5, 0.75])

None

explain

explain() -> DriftExplanation

Generate detailed explanations for all features.

Returns:

Type Description
DriftExplanation

DriftExplanation containing per-feature statistical analysis

explain_feature

explain_feature(feature_name: str) -> FeatureExplanation | None

Generate detailed explanation for a single feature.

Parameters:

Name Type Description Default
feature_name str

Name of the feature to explain

required

Returns:

Type Description
FeatureExplanation | None

FeatureExplanation or None if feature not found

driftwatch.explain.DriftVisualizer

DriftVisualizer(reference_data: DataFrame, production_data: DataFrame, report: DriftReport, style: str = 'seaborn-v0_8-whitegrid', colors: dict[str, str] | None = None)

Visualizes drift between reference and production distributions.

Creates matplotlib figures showing distribution overlays, helping users understand exactly how data has shifted.

Example

from driftwatch import Monitor from driftwatch.explain import DriftVisualizer

monitor = Monitor(reference_data=train_df) report = monitor.check(prod_df)

viz = DriftVisualizer(train_df, prod_df, report) fig = viz.plot_feature("age") plt.show()

Or plot all features

fig = viz.plot_all() plt.savefig("drift_report.png")

Initialize the DriftVisualizer.

Parameters:

Name Type Description Default
reference_data DataFrame

Reference DataFrame (training data)

required
production_data DataFrame

Production DataFrame

required
report DriftReport

DriftReport from Monitor.check()

required
style str

Matplotlib style to use (default: seaborn-v0_8-whitegrid)

'seaborn-v0_8-whitegrid'
colors dict[str, str] | None

Custom color scheme override

None

plot_feature

plot_feature(feature_name: str, bins: int = 50, figsize: tuple[int, int] = (10, 6), show_stats: bool = True, alpha: float = 0.6, colors: dict[str, str] | None = None, title: str | None = None, xlabel: str | None = None, ylabel: str | None = None, hist_kwargs: dict[str, Any] | None = None, stats_kwargs: dict[str, Any] | None = None) -> Any

Plot histogram overlay for a single feature.

Parameters:

Name Type Description Default
feature_name str

Name of the feature to plot

required
bins int

Number of histogram bins

50
figsize tuple[int, int]

Figure size (width, height)

(10, 6)
show_stats bool

Whether to show statistical annotations

True
alpha float

Transparency of histograms

0.6
colors dict[str, str] | None

Custom colors for this plot (keys: reference, production)

None
title str | None

Custom title override

None
xlabel str | None

Custom x-axis label

None
ylabel str | None

Custom y-axis label

None
hist_kwargs dict[str, Any] | None

Additional arguments passed to ax.hist

None
stats_kwargs dict[str, Any] | None

Additional arguments passed to the stats text box

None

Returns:

Type Description
Any

matplotlib Figure object

Raises:

Type Description
ImportError

If matplotlib is not installed

ValueError

If feature not found in data

plot_all

plot_all(cols: int = 2, figsize: tuple[int, int] | None = None, bins: int = 50, alpha: float = 0.6, colors: dict[str, str] | None = None, hist_kwargs: dict[str, Any] | None = None) -> Any

Plot histogram overlays for all numeric features.

Parameters:

Name Type Description Default
cols int

Number of columns in the grid

2
figsize tuple[int, int] | None

Figure size (auto-calculated if None)

None
bins int

Number of histogram bins

50
alpha float

Transparency of histograms

0.6
colors dict[str, str] | None

Custom colors for this plot (keys: reference, production)

None
hist_kwargs dict[str, Any] | None

Additional arguments passed to ax.hist

None

Returns:

Type Description
Any

matplotlib Figure object

save

save(filename: str, feature_name: str | None = None, dpi: int = 150, **kwargs: Any) -> str

Save visualization to file.

Parameters:

Name Type Description Default
filename str

Output filename (supports png, pdf, svg)

required
feature_name str | None

Specific feature to plot (or all if None)

None
dpi int

Resolution for raster formats

150
**kwargs Any

Additional arguments passed to savefig

{}

Returns:

Type Description
str

The filename that was saved