Enrichment

Prepares consolidated data for downstream analysis. Dealer Classification groups dealers into spatial clusters used as coarser-geography dimensions; Inventory Enrichment adds computed columns — sold-date, days-on-lot, half-life weights — that all downstream statistics, models, and take-rate components rely on.

Dealer Classification

Groups dealers into spatial clusters ("neighborhoods") so coarser-geography statistics can be computed without using administrative boundaries like state or region. The component uses a grid-based spatial partitioning approach driven entirely by dealer latitude/longitude from consolidated dealers.

The output is a single <oem>/dealers/<name> asset — one row per dealer with a cluster label column.

Configuration

type: ai_core.components.DealerClassificationComponent
attributes:
  oem: mb
  name: neighborhoods
  label: neighborhood
  grid_size: 5
  start_date: "2025-01-01"

Field	Description
`name`	Asset key segment under `<oem>/dealers/<name>` and Dagster group name
`label`	Column name for the cluster label in the output asset
`grid_size`	Grid cell size in degrees. Smaller values produce finer-grained clusters

Output schema

[dealer_id, <label>, _partition_date]

The <label> column (e.g. neighborhood) contains a string identifier for each cluster.

Downstream usage

The classification asset is joined into Inventory Statistics, Take Rates, and Days on Lot via joins entries keyed on dealer_id. This lets those components group or fan out statistics by neighborhood without storing the cluster label on each vehicle row.

Inventory Enrichment

Adds computed columns to consolidated inventory so downstream assets — statistics, models, take rates — can filter and weight rows without re-deriving sold-date logic each time.

The output is a same-shape <oem>/inventory_enriched asset with extra columns: an inferred sold date, a sold-in-window flag, half-life weights for recency-weighted aggregations, and any other Polars-SQL expressions configured in YAML.

Field expressions

Each field has a name and a Polars-SQL expr. Expressions evaluate against the full consolidated inventory frame (every column is in scope) and apply sequentially, so a later expression can reference a column produced by an earlier one:

type: ai_core.components.InventoryEnrichmentComponent
attributes:
  oem: mb
  name: inventory_enriched
  start_date: "2025-01-01"
  fields:
    - name: inferred_sold_date
      expr: >-
        CASE
          WHEN netstar_official_sold_date IS NOT NULL
            THEN netstar_official_sold_date
          ELSE GREATEST(nafta_last_seen, netstar_last_seen)
        END
    - name: days_on_lot
      expr: >-
        CASE
          WHEN inferred_sold_date IS NOT NULL
            THEN DATEDIFF('day', arrival_date, inferred_sold_date)
          ELSE NULL
        END
    - name: is_sold_in_window
      expr: >-
        inferred_sold_date IS NOT NULL
        AND inferred_sold_date >= DATE('{partition_date}') - INTERVAL 365 DAYS
        AND inferred_sold_date < DATE('{partition_date}')
    - name: is_in_inventory
      expr: inferred_sold_date IS NULL
    - name: hl90_weight
      expr: >-
        CASE
          WHEN is_sold_in_window
          THEN POW(2.0, -DATEDIFF('day', inferred_sold_date, DATE('{partition_date}')) / 90.0)
          ELSE NULL
        END

`{partition_date}` interpolation

Any expr may contain {partition_date}, which is replaced at materialization time with the current partition key (e.g. 2025-04-15). This is essential for sold-window flags and half-life decay weights.

Expressions that do not contain {partition_date} are unaffected.

note

Field names in {...} are Python str.format() placeholders. Escape literal braces as {{ or }}.

Half-life weights

Half-life weights gate sales-period aggregations on is_sold_in_window (365-day look-back) and decay continuously within that window:

hl30_weight  = 2^(−days_since_sale / 30)   # aggressive recency; 30 days ago → 0.5
hl90_weight  = 2^(−days_since_sale / 90)   # moderate recency;   90 days ago → 0.5
hl180_weight = 2^(−days_since_sale / 180)  # broad window;      180 days ago → 0.5

Vehicles outside the window receive NULL. Within the window, the HL exponent controls how steeply weight decays with age. Each weight column powers one perspective in Take Rates.

Source asset

InventoryEnrichmentComponent always reads <oem>/consolidated/inventory for the current partition. Chaining multiple enrichment steps is intentionally not supported; downstream components that need extra per-row columns should pull them in via their own joins and fields.

Dealer Classification​

Configuration​

Output schema​

Downstream usage​

Inventory Enrichment​

Field expressions​

{partition_date} interpolation​

Half-life weights​

Source asset​