Skip to main content

Enrichment

Prepares consolidated data for downstream analysis. Dealer Classification groups dealers into spatial clusters used as coarser-geography dimensions; Inventory Enrichment adds computed columns — sold-date, days-on-lot, half-life weights — that all downstream statistics, models, and take-rate components rely on.

Dealer Classification

Groups dealers into spatial clusters ("neighborhoods") so coarser-geography statistics can be computed without using administrative boundaries like state or region. The component uses a grid-based spatial partitioning approach driven entirely by dealer latitude/longitude from consolidated dealers.

The output is a single <oem>/dealers/<name> asset — one row per dealer with a cluster label column.

Configuration

type: ai_core.components.DealerClassificationComponent
attributes:
oem: mb
name: neighborhoods
label: neighborhood
grid_size: 5
start_date: "2025-01-01"
FieldDescription
nameAsset key segment under <oem>/dealers/<name> and Dagster group name
labelColumn name for the cluster label in the output asset
grid_sizeGrid cell size in degrees. Smaller values produce finer-grained clusters

Output schema

[dealer_id, <label>, _partition_date]

The <label> column (e.g. neighborhood) contains a string identifier for each cluster.

Downstream usage

The classification asset is joined into Inventory Statistics, Take Rates, and Days on Lot via joins entries keyed on dealer_id. This lets those components group or fan out statistics by neighborhood without storing the cluster label on each vehicle row.

Inventory Enrichment

Adds computed columns to consolidated inventory so downstream assets — statistics, models, take rates — can filter and weight rows without re-deriving sold-date logic each time.

The output is a same-shape <oem>/inventory_enriched asset with extra columns: an inferred sold date, a sold-in-window flag, half-life weights for recency-weighted aggregations, and any other Polars-SQL expressions configured in YAML.

Field expressions

Each field has a name and a Polars-SQL expr. Expressions evaluate against the full consolidated inventory frame (every column is in scope) and apply sequentially, so a later expression can reference a column produced by an earlier one:

type: ai_core.components.InventoryEnrichmentComponent
attributes:
oem: mb
name: inventory_enriched
start_date: "2025-01-01"
fields:
- name: inferred_sold_date
expr: >-
CASE
WHEN netstar_official_sold_date IS NOT NULL
THEN netstar_official_sold_date
ELSE GREATEST(nafta_last_seen, netstar_last_seen)
END
- name: days_on_lot
expr: >-
CASE
WHEN inferred_sold_date IS NOT NULL
THEN DATEDIFF('day', arrival_date, inferred_sold_date)
ELSE NULL
END
- name: is_sold_in_window
expr: >-
inferred_sold_date IS NOT NULL
AND inferred_sold_date >= DATE('{partition_date}') - INTERVAL 365 DAYS
AND inferred_sold_date < DATE('{partition_date}')
- name: is_in_inventory
expr: inferred_sold_date IS NULL
- name: hl90_weight
expr: >-
CASE
WHEN is_sold_in_window
THEN POW(2.0, -DATEDIFF('day', inferred_sold_date, DATE('{partition_date}')) / 90.0)
ELSE NULL
END

{partition_date} interpolation

Any expr may contain {partition_date}, which is replaced at materialization time with the current partition key (e.g. 2025-04-15). This is essential for sold-window flags and half-life decay weights.

Expressions that do not contain {partition_date} are unaffected.

note

Field names in {...} are Python str.format() placeholders. Escape literal braces as {{ or }}.

Half-life weights

Half-life weights gate sales-period aggregations on is_sold_in_window (365-day look-back) and decay continuously within that window:

hl30_weight  = 2^(−days_since_sale / 30)   # aggressive recency; 30 days ago → 0.5
hl90_weight = 2^(−days_since_sale / 90) # moderate recency; 90 days ago → 0.5
hl180_weight = 2^(−days_since_sale / 180) # broad window; 180 days ago → 0.5

Vehicles outside the window receive NULL. Within the window, the HL exponent controls how steeply weight decays with age. Each weight column powers one perspective in Take Rates.

Source asset

InventoryEnrichmentComponent always reads <oem>/consolidated/inventory for the current partition. Chaining multiple enrichment steps is intentionally not supported; downstream components that need extra per-row columns should pull them in via their own joins and fields.