Source Groups
Each entry in source_groups names one priority-ranked group of transformed source assets. Records are merged in priority order — lower value wins on conflict (1 = highest priority).
source_groups:
- name: primary # label used in {name}_exists / {name}_last_seen columns
source: pss_dealers # last segment of the transformed asset key
priority: 1 # lower = higher precedence
history_days: 365 # days of transformed partitions to scan (default 365)
Use sources: [key1, key2] instead of source: when multiple assets share the same schema and should be concatenated before the merge step.
observation_date_col
By default, first_seen and last_seen are derived from the Dagster _partition_date. When a source carries its own authoritative observation timestamp, set observation_date_col to that column name and both date fields will be derived from it instead:
source_groups:
- name: dc_active
source: inventory_from_dealerconnect_bq_active
priority: 1
observation_date_col: last_seen_dc_active
allow_nonexistent_upstream_partitions
Passed to Dagster's TimeWindowPartitionMapping. When true (default), the consolidated asset can materialize even if some upstream transformed partitions within the history window do not exist yet — Dagster treats missing partitions as empty rather than raising a dependency error. Set to false to require that every partition in the window is present before the downstream run executes.
filter_window_boundary
Default: true.
Filters out records whose first_seen equals the window start date (partition_date - history_days). Such records have an artifactual first_seen because the vehicle was on lot before the history window opened and the true delivery date is unknown. Disable only when you need to preserve those records explicitly.
gap_fill
Optional gap-fill for inventory source groups. When set, two boolean indicator columns (last_in_gap, first_in_gap) are added to the output and affected dates are estimated from the typical days-on-lot for each vehicle segment.
Two cases handled
last_in_gap — last_seen falls within a gap: the vehicle was sold during the outage so its true sale date is unknown. last_seen is replaced with first_seen + agg(days_on_lot) over clean records in the same segment, clamped so the estimate never moves last_seen earlier than observed and never projects past the gap end date (the vehicle must have sold before data collection resumed).
first_in_gap — first_seen is the day after a gap ends and the vehicle is not still on lot: the vehicle first appeared in the feed when collection resumed, meaning it was delivered during the gap. first_seen is replaced with last_seen - agg(days_on_lot), clamped so the estimate never moves first_seen later than observed and never projects before the gap start date (the vehicle must have been delivered after the outage began).
The aggregation excludes last_in_gap records (unknown sale date), first_in_gap records (unknown delivery date), and still-on-lot vehicles (incomplete duration). Records with no clean reference rows in their group are left with their original dates; the indicator columns still mark them.
Configuration
source_groups:
- name: primary
source: nafta_inventory
priority: 1
gap_fill:
gaps:
- start_date: "2025-03-01" # inclusive
end_date: "2025-03-14" # inclusive
group_by: [model, trim] # vehicle segments for days-on-lot aggregation
aggregation: mean # mean or median (default: mean)
| Field | Description |
|---|---|
gaps | One or more inclusive date ranges where data collection stopped. At least one gap is required. |
group_by | Columns defining vehicle segments for the days-on-lot aggregation (e.g. model, trim). At least one column required. |
aggregation | "mean" (default) or "median" applied to (last_seen - first_seen) over clean records. median is more robust to outliers. |
group_filter | Optional restriction of gap detection to specific group_by value combinations (see below). |
group_filter
By default gap detection and filling applies to all group_by combinations. group_filter restricts it to specific combinations — useful when a gap only affected certain vehicle segments. Rows outside the filter are treated as clean records and may contribute to the reference aggregation.
Each inner list must have the same length as group_by and specifies exact column values to match:
gap_fill:
gaps:
- start_date: "2025-03-01"
end_date: "2025-03-14"
group_by: [model, trim]
group_filter:
- ["GLE", "AMG Line"] # only this model+trim combination is gap-filled
- ["GLC", "AMG Line"]