Skip to content

Monitor API

The Monitor class is the main entry point for drift detection.

Overview

The Monitor analyzes feature distributions between reference (training) and production data to detect drift.

Quick Example

from driftwatch import Monitor
import pandas as pd

# Load training data
train_df = pd.read_parquet("train.parquet")

# Create monitor
monitor = Monitor(
    reference_data=train_df,
    thresholds={
        "psi": 0.2,
        "ks_pvalue": 0.05,
    }
)

# Check production data
prod_df = pd.read_parquet("prod.parquet")
report = monitor.check(prod_df)

API Reference

driftwatch.core.monitor.Monitor

Monitor(reference_data: DataFrame, features: list[str] | None = None, model: Any | None = None, thresholds: dict[str, float] | None = None)

Main class for monitoring data and model drift.

The Monitor compares production data against a reference dataset (typically training data) to detect distribution shifts.

Parameters:

Name Type Description Default
reference_data DataFrame

Reference DataFrame (training data)

required
features list[str] | None

List of feature columns to monitor. If None, all columns are monitored.

None
model Any | None

Optional ML model for prediction drift detection

None
thresholds dict[str, float] | None

Dictionary of threshold values for drift detection. Supported keys: "psi", "ks_pvalue", "wasserstein", "chi2_pvalue"

None
Example

monitor = Monitor( ... reference_data=train_df, ... features=["age", "income", "category"], ... thresholds={"psi": 0.2, "ks_pvalue": 0.05} ... ) report = monitor.check(production_df) print(report.has_drift())

Initialize the monitor with reference data and configuration

Parameters:

Name Type Description Default
reference_data

acts as reference dataframe used as baseline for drift detection.

required
features

List of feature columns to monitor. If None, all columns are monitored.

required
model Any | None

Optional machine learning model thresholds: optional dictionary overriding default drift detection thresholds.

None

Raises:

Type Description
ValueError

if reference data is empty.

Source code in src/driftwatch/core/monitor.py
def __init__(
    self,
    reference_data: pd.DataFrame,
    features: list[str] | None = None,
    model: Any | None = None,
    thresholds: dict[str, float] | None = None,
) -> None:
    """
    Initialize the monitor with reference data and configuration

    Args:
       reference_data : acts as reference dataframe used as baseline for drift detection.
       features : List of feature columns to monitor.
            If None, all columns are monitored.
       model: Optional machine learning model
            thresholds: optional dictionary overriding default drift detection thresholds.

    Raises:
        ValueError: if reference data is empty.

    """
    self._validate_reference_data(reference_data)

    self.reference_data = reference_data
    self.features = features or list(reference_data.columns)
    self.model = model
    self.thresholds = {**self.DEFAULT_THRESHOLDS, **(thresholds or {})}

    self._detectors: dict[str, BaseDetector] = {}
    self._setup_detectors()

check

check(production_data: DataFrame) -> DriftReport

Check for drift between reference and production data.

Each monitored feature in prodution dataset is compared against reference dataset using appropriate detectors.

Parameters:

Name Type Description Default
production_data DataFrame

Production DataFrame to compare

required

Returns:

Type Description
DriftReport

DriftReport containing per-feature and aggregate drift results

Raises:

Type Description
ValueError

If production data is empty or missing features

Source code in src/driftwatch/core/monitor.py
def check(self, production_data: pd.DataFrame) -> DriftReport:
    """
    Check for drift between reference and production data.

    Each monitored feature in prodution dataset is compared against
    reference dataset using appropriate detectors.

    Args:
        production_data: Production DataFrame to compare

    Returns:
        DriftReport containing per-feature and aggregate drift results

    Raises:
        ValueError: If production data is empty or missing features
    """

    self._validate_production_data(production_data)

    feature_results: list[FeatureDriftResult] = []

    for feature in self.features:
        ref_series = self.reference_data[feature]
        prod_series = production_data[feature]

        detector = self._detectors[feature]
        result = detector.detect(ref_series, prod_series)

        feature_results.append(
            FeatureDriftResult(
                feature_name=feature,
                has_drift=result.has_drift,
                score=result.score,
                method=result.method,
                threshold=result.threshold,
                p_value=result.p_value,
            )
        )

    return DriftReport(
        feature_results=feature_results,
        reference_size=len(self.reference_data),
        production_size=len(production_data),
    )

Configuration

Thresholds

Control sensitivity of drift detection:

monitor = Monitor(
    reference_data=train_df,
    thresholds={
        "psi": 0.15,           # More sensitive than default 0.2
        "ks_pvalue": 0.01,     # More strict than default 0.05
        "chi2_pvalue": 0.05,   # Default
        "wasserstein": 0.2,    # For Wasserstein detector
    }
)

Model Version Tracking

Track which model version is being monitored:

monitor = Monitor(
    reference_data=train_df,
    model_version="v1.2.3"
)

report = monitor.check(prod_df)
print(report.model_version)  # "v1.2.3"

Methods

check()

Detect drift in production data:

report = monitor.check(
    production_data=prod_df,
    # Optional: override thresholds per feature
    feature_thresholds={
        "age": {"psi": 0.1},  # More sensitive for age
    }
)

update_reference()

Update reference data (e.g., after model retraining):

# Retrain model with new data
new_train_df = pd.read_parquet("retrain_data.parquet")

# Update monitor
monitor.update_reference(new_train_df)

Best Practices

1. Choose Appropriate Reference Data

Reference data should represent your model's training distribution:

# ✓ Good: Use actual training data
monitor = Monitor(reference_data=train_df)

# ✗ Bad: Using validation data with different distribution
monitor = Monitor(reference_data=val_df)

2. Set Thresholds Based on Business Impact

# High-stakes model: strict thresholds
critical_monitor = Monitor(
    reference_data=train_df,
    thresholds={"psi": 0.1, "ks_pvalue": 0.01}
)

# Exploratory model: relaxed thresholds
exploratory_monitor = Monitor(
    reference_data=train_df,
    thresholds={"psi": 0.3}
)

3. Version Your Reference Data

import joblib

# Save monitor for reproducibility
joblib.dump(monitor, f"monitor_v{model_version}.pkl")

# Load later
monitor = joblib.load("monitor_v1.2.3.pkl")

See Also