Monitor API¶
The Monitor class is the main entry point for drift detection.
Overview¶
The Monitor analyzes feature distributions between reference (training) and production data to detect drift.
Quick Example¶
from driftwatch import Monitor
import pandas as pd
# Load training data
train_df = pd.read_parquet("train.parquet")
# Create monitor
monitor = Monitor(
reference_data=train_df,
thresholds={
"psi": 0.2,
"ks_pvalue": 0.05,
}
)
# Check production data
prod_df = pd.read_parquet("prod.parquet")
report = monitor.check(prod_df)
API Reference¶
driftwatch.core.monitor.Monitor
¶
Monitor(reference_data: DataFrame, features: list[str] | None = None, model: Any | None = None, thresholds: dict[str, float] | None = None)
Main class for monitoring data and model drift.
The Monitor compares production data against a reference dataset (typically training data) to detect distribution shifts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_data
|
DataFrame
|
Reference DataFrame (training data) |
required |
features
|
list[str] | None
|
List of feature columns to monitor. If None, all columns are monitored. |
None
|
model
|
Any | None
|
Optional ML model for prediction drift detection |
None
|
thresholds
|
dict[str, float] | None
|
Dictionary of threshold values for drift detection. Supported keys: "psi", "ks_pvalue", "wasserstein", "chi2_pvalue" |
None
|
Example
monitor = Monitor( ... reference_data=train_df, ... features=["age", "income", "category"], ... thresholds={"psi": 0.2, "ks_pvalue": 0.05} ... ) report = monitor.check(production_df) print(report.has_drift())
Initialize the monitor with reference data and configuration
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_data
|
acts as reference dataframe used as baseline for drift detection. |
required | |
features
|
List of feature columns to monitor. If None, all columns are monitored. |
required | |
model
|
Any | None
|
Optional machine learning model thresholds: optional dictionary overriding default drift detection thresholds. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
if reference data is empty. |
Source code in src/driftwatch/core/monitor.py
check
¶
check(production_data: DataFrame) -> DriftReport
Check for drift between reference and production data.
Each monitored feature in prodution dataset is compared against reference dataset using appropriate detectors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
production_data
|
DataFrame
|
Production DataFrame to compare |
required |
Returns:
| Type | Description |
|---|---|
DriftReport
|
DriftReport containing per-feature and aggregate drift results |
Raises:
| Type | Description |
|---|---|
ValueError
|
If production data is empty or missing features |
Source code in src/driftwatch/core/monitor.py
Configuration¶
Thresholds¶
Control sensitivity of drift detection:
monitor = Monitor(
reference_data=train_df,
thresholds={
"psi": 0.15, # More sensitive than default 0.2
"ks_pvalue": 0.01, # More strict than default 0.05
"chi2_pvalue": 0.05, # Default
"wasserstein": 0.2, # For Wasserstein detector
}
)
Model Version Tracking¶
Track which model version is being monitored:
monitor = Monitor(
reference_data=train_df,
model_version="v1.2.3"
)
report = monitor.check(prod_df)
print(report.model_version) # "v1.2.3"
Methods¶
check()¶
Detect drift in production data:
report = monitor.check(
production_data=prod_df,
# Optional: override thresholds per feature
feature_thresholds={
"age": {"psi": 0.1}, # More sensitive for age
}
)
update_reference()¶
Update reference data (e.g., after model retraining):
# Retrain model with new data
new_train_df = pd.read_parquet("retrain_data.parquet")
# Update monitor
monitor.update_reference(new_train_df)
Best Practices¶
1. Choose Appropriate Reference Data¶
Reference data should represent your model's training distribution:
# ✓ Good: Use actual training data
monitor = Monitor(reference_data=train_df)
# ✗ Bad: Using validation data with different distribution
monitor = Monitor(reference_data=val_df)
2. Set Thresholds Based on Business Impact¶
# High-stakes model: strict thresholds
critical_monitor = Monitor(
reference_data=train_df,
thresholds={"psi": 0.1, "ks_pvalue": 0.01}
)
# Exploratory model: relaxed thresholds
exploratory_monitor = Monitor(
reference_data=train_df,
thresholds={"psi": 0.3}
)
3. Version Your Reference Data¶
import joblib
# Save monitor for reproducibility
joblib.dump(monitor, f"monitor_v{model_version}.pkl")
# Load later
monitor = joblib.load("monitor_v1.2.3.pkl")
See Also¶
- Drift Detectors → - Available detection methods
- Reports → - Working with drift reports
- Thresholds Guide → - Tuning sensitivity