Monitoring
The platform uses Sentry for alerting on two categories of failure: data quality check failures and Dagster run crashes. Both are wired automatically via build_core_defs() — no per-OEM configuration is needed.
Sensors
| Sensor | Trigger | Sentry level |
|---|---|---|
asset_check_sentry_sensor | Failed asset check evaluation (WARN or ERROR) | warning / error |
run_failure_sentry_sensor | Dagster run transitions to FAILED | error |
Both sensors start automatically (DefaultSensorStatus.RUNNING) after every deploy.
asset_check_sentry_sensor
Polls the Dagster event log every 60 seconds for new ASSET_CHECK_EVALUATION events. Uses a cursor (storage ID) so only evaluations recorded since the last tick are forwarded — not historical ones. Batches up to 100 events per tick and paginates if more are pending.
run_failure_sentry_sensor
Event-driven — Dagster calls it immediately when a run reaches FAILED status. No polling lag. Captures the job name, error message, OEM, and partition date.
Environment Variable
Set SENTRY_DSN in each OEM project's .env file (or as a secret in Dagster Cloud):
SENTRY_DSN=https://<key>@<org>.ingest.sentry.io/<project-id>
If SENTRY_DSN is not set, asset_check_sentry_sensor returns a SkipReason and no events are sent. run_failure_sentry_sensor exits silently.
Sentry Event Structure
Asset check failure
| Field | Value |
|---|---|
message | Dagster asset check failure: <check_name> on <asset_key> (<severity>) |
level | warning (WARN) or error (ERROR) |
Tags (filterable in Sentry):
| Tag | Example |
|---|---|
oem | audi |
check_name | discrepancies |
severity | WARN |
partition_date | 2026-05-01 |
Extra (visible in issue detail):
| Field | Description |
|---|---|
asset_key | Full asset key path |
run_id | Dagster run ID — paste into the Dagster UI to inspect the run |
partition_date | Partition date, if applicable |
metadata | Check-specific metadata (e.g. count, discrepancies) |
Check-specific metadata
| Check | Metadata fields |
|---|---|
discrepancies | records — up to 50 sample conflicting records |
integrity | count, missing_source_count, unexpected_row_count |
unresolved_refs | count, sample_refs |
Run failure
| Field | Value |
|---|---|
message | Dagster run failure: <asset_path> (<error_type>) (with step info — step_key rewritten as a slash-separated asset path) or Dagster run failure: <code_location>/<job_name> (<error_class>) (fallback when no step ran) |
level | error |
Tags: job_name, oem, partition_date (when present)
Extra: run_id, job_name, error (step error message), partition_date
Sentry Grouping
Sentry groups events into issues by message. The same check failing on the same asset every day accumulates as one issue with an incrementing occurrence count. Tags (not extra) are used in custom fingerprint rules — check_name, oem, and severity are in tags for this reason.
Local Testing
To verify events reach Sentry without a real pipeline run:
# Asset check failure
$env:SENTRY_DSN = "https://..."
uv run --directory projects/ai_audi python ../../scripts/test_sentry_e2e.py
# Run failure
uv run --directory projects/ai_audi python ../../scripts/test_run_failure_e2e.py
Both scripts call the sensor helpers directly against an ephemeral Dagster instance — no dg dev required.