Ingestion Design
The ingestion layer extracts raw data from OEM-specific external APIs, validates it, and progressively refines it into consolidated entity records partitioned by day. Each OEM runs its own independent Dagster deployment; shared infrastructure lives in a common library so OEM pipelines don't share state or affect each other.
Pipeline Structure
Every ingestion pipeline follows the same three-stage structure:
| Stage | Responsibility |
|---|---|
| Raw | Fetch data from the source API and store the response exactly as returned |
| Transformed | Extract entity records from the raw response, deduplicate by key group, and validate |
| Consolidated | Merge transformed records across all sources for an entity type |
Each stage has asset checks that run automatically after materialization.
Dependency Ordering
The asset DAG has two independent ordering dimensions that do not form cycles:
-
Source dependencies operate in the raw and transformed tiers. One raw source may depend on another (e.g. fetching search results requires dealer IDs from a prior fetch). These dependencies are fully resolved by the end of the transformed tier.
-
Entity dependencies begin at the consolidated tier, where
foreign_keysallow one entity's consolidation to reference another consolidated entity's same-partition data. For example, consolidated inventory can reference consolidated models to attach a resolved model ID to each row. Since all source-level dependencies are already resolved, entity ordering is independent of source ordering.
This separation guarantees no circular dependencies: sources never depend on entities, and entities never depend on sources.
Asset Key Conventions
Dagster asset keys form a path hierarchy using / as the separator:
| Stage | Pattern | Example |
|---|---|---|
| Source | <oem>/sources/<source>_<resource> | audi/sources/scs_search |
| Raw | <oem>/raw/<source>_<resource> | audi/raw/scs_search |
| Transformed | <oem>/transformed/<name>_<entity> | audi/transformed/scs_inventory |
| Consolidated | <oem>/consolidated/<entity> | audi/consolidated/inventory |
Asset Checks
| Stage | Check name | What it verifies | Blocking |
|---|---|---|---|
| Raw | fetch_attempts | All HTTP requests made by the fetcher returned 2xx; per-status failure patterns surfaced in metadata | No |
| Transformed | discrepancies | No conflicting rows for the same entity ID; same-source field disagreements grouped into discrepancy patterns | No |
| Consolidated | integrity | Every row has a non-null entity ID; records row count | No |
| Consolidated | unresolved_refs | Cross-entity reference links resolved successfully | No |
| Transformed (highest-priority source) | cross_source_discrepancies | Cross-source merges produce no conflicting field values for the same entity; field-level disagreements grouped into discrepancy patterns | No |
All checks are non-blocking (severity WARN) — they surface issues in the Dagster UI without aborting the run. The fetch_attempts check additionally reports tracking=instrumented or tracking=uninstrumented in its metadata so operators can tell whether the source module is recording per-request outcomes (see Component Reference).
Asset Groups and Tags
Source observability assets are grouped sources. Raw fetch assets are grouped raw. Transformed assets are grouped transformed. Consolidated assets are grouped consolidated.
Groups are visible in the Dagster UI's asset catalog and can be used to filter and select assets in bulk.
Every ingestion asset also carries an oem tag for per-OEM filtering:
oem: audi
Partitioning
All ingestion assets use a DailyPartitionsDefinition with end_offset=1. The end_offset=1 means today's date is always a valid partition, so assets can be materialized on the day they are due without waiting for the day to roll over.