Skip to main content

Monitoring

The platform uses Sentry for alerting on two categories of failure: data quality check failures and Dagster run crashes. Both are wired automatically via build_core_defs() — no per-OEM configuration is needed.

Sensors

SensorTriggerSentry level
asset_check_sentry_sensorFailed asset check evaluation (WARN or ERROR)warning / error
run_failure_sentry_sensorDagster run transitions to FAILEDerror

Both sensors start automatically (DefaultSensorStatus.RUNNING) after every deploy.

asset_check_sentry_sensor

Polls the Dagster event log every 60 seconds for new ASSET_CHECK_EVALUATION events. Uses a cursor (storage ID) so only evaluations recorded since the last tick are forwarded — not historical ones. Batches up to 100 events per tick and paginates if more are pending.

run_failure_sentry_sensor

Event-driven — Dagster calls it immediately when a run reaches FAILED status. No polling lag. Captures the job name, error message, OEM, and partition date.

Environment Variable

Set SENTRY_DSN in each OEM project's .env file (or as a secret in Dagster Cloud):

SENTRY_DSN=https://<key>@<org>.ingest.sentry.io/<project-id>

If SENTRY_DSN is not set, asset_check_sentry_sensor returns a SkipReason and no events are sent. run_failure_sentry_sensor exits silently.

Sentry Event Structure

Asset check failure

FieldValue
messageDagster asset check failure: <check_name> on <asset_key> (<severity>)
levelwarning (WARN) or error (ERROR)

Tags (filterable in Sentry):

TagExample
oemaudi
check_namediscrepancies
severityWARN
partition_date2026-05-01

Extra (visible in issue detail):

FieldDescription
asset_keyFull asset key path
run_idDagster run ID — paste into the Dagster UI to inspect the run
partition_datePartition date, if applicable
metadataCheck-specific metadata (e.g. count, discrepancies)

Check-specific metadata

CheckMetadata fields
discrepanciesrecords — up to 50 sample conflicting records
integritycount, missing_source_count, unexpected_row_count
unresolved_refscount, sample_refs

Run failure

FieldValue
messageDagster run failure: <asset_path> (<error_type>) (with step info — step_key rewritten as a slash-separated asset path) or Dagster run failure: <code_location>/<job_name> (<error_class>) (fallback when no step ran)
levelerror

Tags: job_name, oem, partition_date (when present)

Extra: run_id, job_name, error (step error message), partition_date

Sentry Grouping

Sentry groups events into issues by message. The same check failing on the same asset every day accumulates as one issue with an incrementing occurrence count. Tags (not extra) are used in custom fingerprint rules — check_name, oem, and severity are in tags for this reason.

Local Testing

To verify events reach Sentry without a real pipeline run:

# Asset check failure
$env:SENTRY_DSN = "https://..."
uv run --directory projects/ai_audi python ../../scripts/test_sentry_e2e.py

# Run failure
uv run --directory projects/ai_audi python ../../scripts/test_run_failure_e2e.py

Both scripts call the sensor helpers directly against an ephemeral Dagster instance — no dg dev required.