Skip to content

CLI Guide

Use DriftWatch from the command line for batch processing and CI/CD integration.

Installation

pip install driftwatch[cli]

Commands

driftwatch check

Check for drift between reference and production datasets.

driftwatch check --ref REFERENCE --prod PRODUCTION [OPTIONS]

Arguments:

  • --ref, -r - Path to reference dataset (CSV or Parquet)
  • --prod, -p - Path to production dataset (CSV or Parquet)

Options:

  • --threshold-psi FLOAT - PSI threshold (default: 0.2)
  • --threshold-ks FLOAT - KS p-value threshold (default: 0.05)
  • --threshold-chi2 FLOAT - ChiΒ² p-value threshold (default: 0.05)
  • --output, -o PATH - Save report to JSON file

Exit Codes:

  • 0 - No drift detected (OK)
  • 1 - Drift detected (WARNING)
  • 2 - Critical drift (CRITICAL)

Examples

Basic Check

driftwatch check --ref train.csv --prod prod.csv

Output:

πŸ” DriftWatch - Drift Detection

Loading reference data from train.csv...
βœ“ Loaded 10,000 samples with 5 features

Loading production data from prod.csv...
βœ“ Loaded 2,500 samples with 5 features

Initializing monitor...
Running drift detection...

Status: WARNING
Drift Detected: 2/5 features
Drift Ratio: 40.0%

Feature Analysis:

┏━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Feature      ┃ Method ┃  Score ┃ Threshold ┃ Status   ┃
┑━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
β”‚ age          β”‚ psi    β”‚ 0.3521 β”‚    0.2000 β”‚ ⚠️ DRIFT β”‚
β”‚ income       β”‚ psi    β”‚ 0.1234 β”‚    0.2000 β”‚ βœ“ OK     β”‚
β”‚ credit_score β”‚ psi    β”‚ 0.2891 β”‚    0.2000 β”‚ ⚠️ DRIFT β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Custom Thresholds

driftwatch check \
  --ref train.parquet \
  --prod prod.parquet \
  --threshold-psi 0.15 \
  --threshold-ks 0.01

Save Report

driftwatch check \
  --ref train.csv \
  --prod prod.csv \
  --output drift_report.json

driftwatch report

Display a drift report from a JSON file.

driftwatch report REPORT_FILE [OPTIONS]

Arguments:

  • REPORT_FILE - Path to drift report JSON

Options:

  • --format, -f - Output format: table or json (default: table)
  • --output, -o PATH - Save output to file

CI/CD Integration

GitHub Actions

name: Drift Check

on:
  schedule:
    - cron: '0 0 * * *'  # Daily at midnight

jobs:
  drift-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: pip install driftwatch[cli]

      - name: Run drift check
        run: |
          driftwatch check \
            --ref data/train.parquet \
            --prod data/production_latest.parquet \
            --output drift_report.json

      - name: Upload report
        uses: actions/upload-artifact@v3
        if: always()
        with:
          name: drift-report
          path: drift_report.json

      - name: Notify on drift
        if: failure()
        run: |
          # Send notification
          echo "Drift detected! Check artifacts."

GitLab CI

drift_check:
  stage: test
  image: python:3.11
  script:
    - pip install driftwatch[cli]
    - |
      driftwatch check \
        --ref data/train.parquet \
        --prod data/production.parquet \
        --threshold-psi 0.15
  artifacts:
    paths:
      - drift_report.json
    when: always
  only:
    - schedules

Production Workflows

Daily Drift Monitoring

#!/bin/bash
# daily_drift_check.sh

# Download latest production data
aws s3 cp s3://my-bucket/production_data.parquet .

# Run drift check
driftwatch check \
  --ref data/train.parquet \
  --prod production_data.parquet \
  --output drift_report_$(date +%Y%m%d).json

# Alert if drift detected
if [ $? -ne 0 ]; then
  # Send to Slack, PagerDuty, etc.
  curl -X POST https://hooks.slack.com/... \
    -d "{\"text\": \"Drift detected on $(date)\"}"
fi

Pre-Deployment Check

#!/bin/bash
# Ensure no drift before deploying new model

driftwatch check \
  --ref data/train_v2.parquet \
  --prod data/validation.parquet \
  --threshold-psi 0.10

if [ $? -eq 0 ]; then
  echo "βœ“ No drift detected. Safe to deploy."
  # Deploy model
else
  echo "⚠️ Drift detected. Review before deploying."
  exit 1
fi

Tips & Tricks

1. Use Parquet for Speed

Parquet is much faster than CSV for large datasets:

# Convert CSV to Parquet
python -c "
import pandas as pd
pd.read_csv('large_file.csv').to_parquet('large_file.parquet')
"

# Use in drift check
driftwatch check --ref train.parquet --prod prod.parquet

2. Pipe to jq for JSON

driftwatch check --ref train.csv --prod prod.csv --output - | jq .

3. Check Specific Features Only

Filter your data before checking:

# filter_features.py
import pandas as pd
import sys

df = pd.read_parquet(sys.argv[1])
df[['age', 'income']].to_parquet(sys.argv[2])
python filter_features.py prod.parquet prod_filtered.parquet
driftwatch check --ref train_filtered.parquet --prod prod_filtered.parquet

Next Steps