CLI Guide¶
Use DriftWatch from the command line for batch processing and CI/CD integration.
Installation¶
Commands¶
driftwatch check¶
Check for drift between reference and production datasets.
Arguments:
--ref,-r- Path to reference dataset (CSV or Parquet)--prod,-p- Path to production dataset (CSV or Parquet)
Options:
--threshold-psi FLOAT- PSI threshold (default: 0.2)--threshold-ks FLOAT- KS p-value threshold (default: 0.05)--threshold-chi2 FLOAT- ChiΒ² p-value threshold (default: 0.05)--output,-oPATH - Save report to JSON file
Exit Codes:
0- No drift detected (OK)1- Drift detected (WARNING)2- Critical drift (CRITICAL)
Examples¶
Basic Check¶
Output:
π DriftWatch - Drift Detection
Loading reference data from train.csv...
β Loaded 10,000 samples with 5 features
Loading production data from prod.csv...
β Loaded 2,500 samples with 5 features
Initializing monitor...
Running drift detection...
Status: WARNING
Drift Detected: 2/5 features
Drift Ratio: 40.0%
Feature Analysis:
ββββββββββββββββ³βββββββββ³βββββββββ³ββββββββββββ³βββββββββββ
β Feature β Method β Score β Threshold β Status β
β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β age β psi β 0.3521 β 0.2000 β β οΈ DRIFT β
β income β psi β 0.1234 β 0.2000 β β OK β
β credit_score β psi β 0.2891 β 0.2000 β β οΈ DRIFT β
ββββββββββββββββ΄βββββββββ΄βββββββββ΄ββββββββββββ΄βββββββββββ
Custom Thresholds¶
driftwatch check \
--ref train.parquet \
--prod prod.parquet \
--threshold-psi 0.15 \
--threshold-ks 0.01
Save Report¶
driftwatch report¶
Display a drift report from a JSON file.
Arguments:
REPORT_FILE- Path to drift report JSON
Options:
--format,-f- Output format:tableorjson(default:table)--output,-oPATH - Save output to file
CI/CD Integration¶
GitHub Actions¶
name: Drift Check
on:
schedule:
- cron: '0 0 * * *' # Daily at midnight
jobs:
drift-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: pip install driftwatch[cli]
- name: Run drift check
run: |
driftwatch check \
--ref data/train.parquet \
--prod data/production_latest.parquet \
--output drift_report.json
- name: Upload report
uses: actions/upload-artifact@v3
if: always()
with:
name: drift-report
path: drift_report.json
- name: Notify on drift
if: failure()
run: |
# Send notification
echo "Drift detected! Check artifacts."
GitLab CI¶
drift_check:
stage: test
image: python:3.11
script:
- pip install driftwatch[cli]
- |
driftwatch check \
--ref data/train.parquet \
--prod data/production.parquet \
--threshold-psi 0.15
artifacts:
paths:
- drift_report.json
when: always
only:
- schedules
Production Workflows¶
Daily Drift Monitoring¶
#!/bin/bash
# daily_drift_check.sh
# Download latest production data
aws s3 cp s3://my-bucket/production_data.parquet .
# Run drift check
driftwatch check \
--ref data/train.parquet \
--prod production_data.parquet \
--output drift_report_$(date +%Y%m%d).json
# Alert if drift detected
if [ $? -ne 0 ]; then
# Send to Slack, PagerDuty, etc.
curl -X POST https://hooks.slack.com/... \
-d "{\"text\": \"Drift detected on $(date)\"}"
fi
Pre-Deployment Check¶
#!/bin/bash
# Ensure no drift before deploying new model
driftwatch check \
--ref data/train_v2.parquet \
--prod data/validation.parquet \
--threshold-psi 0.10
if [ $? -eq 0 ]; then
echo "β No drift detected. Safe to deploy."
# Deploy model
else
echo "β οΈ Drift detected. Review before deploying."
exit 1
fi
Tips & Tricks¶
1. Use Parquet for Speed¶
Parquet is much faster than CSV for large datasets:
# Convert CSV to Parquet
python -c "
import pandas as pd
pd.read_csv('large_file.csv').to_parquet('large_file.parquet')
"
# Use in drift check
driftwatch check --ref train.parquet --prod prod.parquet
2. Pipe to jq for JSON¶
3. Check Specific Features Only¶
Filter your data before checking:
# filter_features.py
import pandas as pd
import sys
df = pd.read_parquet(sys.argv[1])
df[['age', 'income']].to_parquet(sys.argv[2])
python filter_features.py prod.parquet prod_filtered.parquet
driftwatch check --ref train_filtered.parquet --prod prod_filtered.parquet