Take Rates

Computes per-attribute take rates and KPIs from featurized inventory, then blends them across geographic layers to produce smooth dealer-level estimates.

Attribute Statistics

Produces per-partition take rates and metric KPIs for vehicle attribute codes (paint colors, options, packages, …), grouped by configurable dimensions like dealer or carline.

The component is a pure aggregation engine: it does no date arithmetic or window logic of its own. All filtering and weighting columns must be pre-computed by Inventory Enrichment and flow through the vehicle_featurized input.

Inputs

vehicle_featurized — One row per car with a vehicle_attributes column of native List(Struct({attribute_type, attribute_code})).
inventory — Enriched inventory; supplies filter, weight, and metric columns plus any per-vehicle dimension columns referenced by aggregations.
joins — Reference tables joined onto each vehicle to expose dimension columns used by aggregations (e.g. dealer → country, model → carline).
derived_fields — Polars-SQL expressions evaluated after inventory and joins land. Use these to combine columns coming from different sources (for example, abnormal_days_on_lot = days_on_lot - predicted_days_on_lot).

Aggregations

An aggregation defines the GROUP BY dimensions for one output asset. Each aggregation produces a separate Dagster asset.

aggregations:
  - key: country_by_carline
    group_by: [carline_id, model_catalog_id]
  - key: country_by_dealer
    group_by: [dealer_id]
  - key: country_by_dealer_carline
    group_by: [dealer_id, carline_id]

Field	Description
`key`	Leaf segment of the asset key. Lands at `<oem>/attribute_stats/<key>`
`group_by`	Columns to group over. Must be present on the joined frame

Statistics

A statistic defines one vehicle sub-population and what to measure. Multiple statistics are computed together and stored as columns in the same aggregation asset.

statistics:
  - label: sales
    filter_column: is_sold_in_window
    metric_columns: [days_on_lot]

  - label: sales_hl90
    filter_column: is_sold_in_window
    weight_column: hl90_weight
    metric_columns: [days_on_lot]

  - label: inventory
    filter_column: is_in_inventory
    metric_columns: [days_on_lot]

Field	Required	Description
`label`	Yes	Column prefix for output columns (e.g. `sales`, `sales_hl90`)
`filter_column`	No	Boolean column. Only rows where it is `true` are included. Missing column → empty result, no error
`weight_column`	No	Float weight applied uniformly to take rate and all metric aggregations. Missing column falls back to `1.0`
`metric_columns`	No	Numeric columns to compute weighted avg, diff, and z for. Missing columns are silently skipped

note

A single weight_column applies to both the take rate and the metric aggregations, so all output columns for a given statistic reflect the same "perspective" on the market. For two HL variants, define two statistics.

Output columns

Each asset contains the grouping columns, then attribute_type / attribute_code, then a block of {label}_* columns for every statistic:

Column	Description
`{label}_take_rate`	Share of group weight carried by vehicles with this attribute
`{label}_group_nobs`	Total weight (or count) in the group — the take-rate denominator
`{label}_avg_{col}`	Weighted mean of metric column for vehicles carrying this attribute
`{label}_{col}_diff`	Attribute mean minus group mean (negative → faster sales)
`{label}_{col}_z`	Diff divided by group weighted std. Null when std is zero or null

Each asset carries one take_rate_sums_{label} asset check per statistic.

Weighting scheme

When weight_column is set:

Take rate numerator: sum(weight) over vehicles with the attribute.
Take rate denominator: sum(weight) over all vehicles in the group.
Weighted metric mean: sum(weight × metric) / sum(weight where metric not null).
Weighted variance: sum(w × x²) / sum(w) − (sum(w × x) / sum(w))².

Geographic Smoothing

GeographicSmootherComponent produces a single dealer-level asset that blends multiple Attribute Statistics layers into one smoothed estimate per (dealer_id, attribute_type, attribute_code). Dealers with few sales borrow strength from coarser geographies — neighborhood, then country — proportionally to sample size. The pattern replicates attribute_stats_smart from CDP.

Shrinkage blend

For each (dealer_id, attribute_type, attribute_code) cell and each declared field:

raw_weight_i = eval(field.weight_expr on layer_i stats)
weight_i     = raw_weight_i / divisor_i
               if raw_weight_i > min_obs AND field value IS NOT NULL
             = 0  otherwise

blended = Σ(weight_i × value_i) / Σ(weight_i)   [null when Σ = 0]

The weight_expr is a Polars-SQL expression — typically the group observation count or the attribute-level count (group_nobs * take_rate). The divisor normalises weights so coarser layers (with more data) contribute equally only when the dealer layer has too few observations.

Layers

Layers are ordered finest → coarsest. Layer 0 defines the base frame: every dealer in the assembled joins dimension frame appears in the output for every (attribute_type, attribute_code) pair observed in that layer. Coarser layers contribute weight=0 for dealers they have no data for.

Layer	`divisor`	`join_on`	Meaning
`dealer`	`1`	`[dealer_id]`	Dealer's own sales — full weight per observation
`neighborhood`	`15`	`[neighborhood]`	~15 dealers per neighborhood on average
`country`	`600`	`[country]`	All dealers in market — ~600× more data

The joins list assembles the dimension frame mapping dealer_id to coarser geographies. The component joins coarser-layer stats onto that frame to fan them out to dealer granularity, so stat assets never need to carry a dealer_id column.

Fields

Each BlendFieldSpec declares one column to blend and the SQL expression that drives its weight. Only declared fields appear in the output; z-score columns ({label}_{col}_z) are intentionally excluded — z-scores from different geographic layers have incompatible denominators and cannot be meaningfully combined.

fields:
  - name: sales_take_rate
    weight_expr: sales_group_nobs

  - name: sales_avg_days_on_lot
    weight_expr: "sales_group_nobs * sales_take_rate"

Full YAML example

type: ai_core.components.GeographicSmootherComponent
attributes:
  oem: mb
  name: demand_take_rates
  start_date: "2025-01-01"
  joins:
    - asset: mb/consolidated/dealers
      join_on: [dealer_id]
      fields: [country]
    - asset: mb/dealers/neighborhoods
      join_on: [dealer_id]
      fields: [neighborhood]
  min_obs: 5
  layers:
    - name: dealer
      divisor: 1
      join_on: [dealer_id]
      asset: mb/attribute_stats/country_by_dealer
    - name: neighborhood
      divisor: 15
      join_on: [neighborhood]
      asset: mb/attribute_stats/country_by_neighborhood
    - name: country
      divisor: 600
      join_on: [country]
      asset: mb/attribute_stats/country_by_country
  fields:
    - name: sales_take_rate
      weight_expr: sales_group_nobs
    - name: sales_avg_days_on_lot
      weight_expr: "sales_group_nobs * sales_take_rate"

Output schema: [dealer_id, attribute_type, attribute_code, <declared fields>, _partition_date].

Demand Imputation

ImputedDemandComponent adjusts the smoothed take-rates for a model-year transition, so next-year synthetic candidates are scored against demand that reflects what actually changed rather than the prior year's option set. It reads smoothed_demand_take_rates and the enriched models entity, resolves each rule's model_filters (regex on model-table fields, incl. trim_identifier) to concrete model_col values, and rewrites the affected (attribute_type, attribute_code) rows. All rules operate at the attribute level — never on raw OEM option codes.

Rule types

Rule	Purpose
`RenameRule`	Transfer demand from a discontinued donor attribute to its replacement (zero-sum; same-`donor_code` rules are additive, cross-group multiplicative).
`RemoveRule`	Zero an attribute that is no longer a customer choice on the trim — either a genuine drop, or (the common transition case) a feature that became standard, whose stale optional take-rate must be cleared since the pipeline has no next-year sales yet to observe the new 100%.
`SeedRule`	Estimate a newly-available attribute's rate from a same-population proxy attribute's own rate (`portion` scales it; `1.0` = deterministic co-occurrence).
`FlatSeedRule`	Seed a newly-available attribute at a manually assumed flat rate — last resort when no same-population proxy exists. An explicit guess, to be replaced by real sales data.

Partition attributes (e.g. PAINT_COLOR, ROOF_TYPE) are renormalized back to sum-1 after removes/seeds; mutually-inclusive types (mutually_inclusive_attribute_types, e.g. FUNCTIONAL_ATTRIBUTE) are left at their reduced/expanded sum. A single removed/seeded value cascades automatically to any _X_ combination and _SET composite attribute that contains it, via exact token matching.

Apply order

Rules are applied in a fixed order: renames → removes → flat seeds → proxy seeds. Flat seeds run before proxy seeds so a flat-seeded attribute can itself be a SeedRule proxy — e.g. a new package seeded at an assumed rate, then a feature that ships with that package seeded off it (portion: 1.0), keeping the assumed rate defined in exactly one place.

Scoping to trims

The same option code often changes differently across trims of one model (standard on one, a new choice on another, repackaged on a third), so rules are scoped with model_filters.trim_identifier rather than model-wide. A feature that became standard is removed only on the trims where that happened; a feature that stayed a choice is left untouched.

YAML config

type: ai_core.components.ImputedDemandComponent
attributes:
  name: demand_take_rates
  start_date: "2025-01-01"
  smoothed_stats: <oem>/<market>/smoothed_demand_take_rates
  models: <oem>/<market>/enriched/models
  model_col: base_model_id
  mutually_inclusive_attribute_types: [FUNCTIONAL_ATTRIBUTE]
  remove_rules:
    - model_filters: {model_code: "^JLJS74", trim_identifier: "^R$"}
      donor_code: "NAVIGATION_SYSTEM"    # became standard on Rubicon R
  flat_seed_rules:
    - model_filters: {model_code: "^JLJS74", trim_identifier: "^R$"}
      target_code: "TOW_PACKAGE"         # new package, no proxy → assumed rate
      rate: 0.35
  seed_rules:
    - model_filters: {model_code: "^JLJS74", trim_identifier: "^R$"}
      target_code: "PREMIUM_AUDIO"       # ships with the package above
      proxy_code: "TOW_PACKAGE"          # inherits its rate (applied after flat seeds)
      portion: 1.0
  demand_cols_to_impute: [sales_take_rate, sales_hl90_take_rate]

Output schema matches the smoothed input: [dealer_id, attribute_type, attribute_code, <demand cols>, _partition_date].

Attribute Statistics​

Inputs​

Aggregations​

Statistics​

Output columns​

Weighting scheme​

Geographic Smoothing​

Shrinkage blend​

Layers​

Fields​

Full YAML example​

Demand Imputation​

Rule types​

Apply order​

Scoping to trims​

YAML config​