Skip to main content

Take Rates

Computes per-attribute take rates and KPIs from featurized inventory, then blends them across geographic layers to produce smooth dealer-level estimates.

Attribute Statistics

Produces per-partition take rates and metric KPIs for vehicle attribute codes (paint colors, options, packages, …), grouped by configurable dimensions like dealer or carline.

The component is a pure aggregation engine: it does no date arithmetic or window logic of its own. All filtering and weighting columns must be pre-computed by Inventory Enrichment and flow through the vehicle_featurized input.

Inputs

  • vehicle_featurized — One row per car with a vehicle_attributes column of native List(Struct({attribute_type, attribute_code})).
  • inventory — Enriched inventory; supplies filter, weight, and metric columns plus any per-vehicle dimension columns referenced by aggregations.
  • joins — Reference tables joined onto each vehicle to expose dimension columns used by aggregations (e.g. dealer → country, model → carline).
  • derived_fields — Polars-SQL expressions evaluated after inventory and joins land. Use these to combine columns coming from different sources (for example, abnormal_days_on_lot = days_on_lot - predicted_days_on_lot).

Aggregations

An aggregation defines the GROUP BY dimensions for one output asset. Each aggregation produces a separate Dagster asset.

aggregations:
- key: country_by_carline
group_by: [carline_id, model_catalog_id]
- key: country_by_dealer
group_by: [dealer_id]
- key: country_by_dealer_carline
group_by: [dealer_id, carline_id]
FieldDescription
keyLeaf segment of the asset key. Lands at <oem>/attribute_stats/<key>
group_byColumns to group over. Must be present on the joined frame

Statistics

A statistic defines one vehicle sub-population and what to measure. Multiple statistics are computed together and stored as columns in the same aggregation asset.

statistics:
- label: sales
filter_column: is_sold_in_window
metric_columns: [days_on_lot]

- label: sales_hl90
filter_column: is_sold_in_window
weight_column: hl90_weight
metric_columns: [days_on_lot]

- label: inventory
filter_column: is_in_inventory
metric_columns: [days_on_lot]
FieldRequiredDescription
labelYesColumn prefix for output columns (e.g. sales, sales_hl90)
filter_columnNoBoolean column. Only rows where it is true are included. Missing column → empty result, no error
weight_columnNoFloat weight applied uniformly to take rate and all metric aggregations. Missing column falls back to 1.0
metric_columnsNoNumeric columns to compute weighted avg, diff, and z for. Missing columns are silently skipped
note

A single weight_column applies to both the take rate and the metric aggregations, so all output columns for a given statistic reflect the same "perspective" on the market. For two HL variants, define two statistics.

Output columns

Each asset contains the grouping columns, then attribute_type / attribute_code, then a block of {label}_* columns for every statistic:

ColumnDescription
{label}_take_rateShare of group weight carried by vehicles with this attribute
{label}_group_nobsTotal weight (or count) in the group — the take-rate denominator
{label}_avg_{col}Weighted mean of metric column for vehicles carrying this attribute
{label}_{col}_diffAttribute mean minus group mean (negative → faster sales)
{label}_{col}_zDiff divided by group weighted std. Null when std is zero or null

Each asset carries one take_rate_sums_{label} asset check per statistic.

Weighting scheme

When weight_column is set:

  • Take rate numerator: sum(weight) over vehicles with the attribute.
  • Take rate denominator: sum(weight) over all vehicles in the group.
  • Weighted metric mean: sum(weight × metric) / sum(weight where metric not null).
  • Weighted variance: sum(w × x²) / sum(w) − (sum(w × x) / sum(w))².

Geographic Smoothing

GeographicSmootherComponent produces a single dealer-level asset that blends multiple Attribute Statistics layers into one smoothed estimate per (dealer_id, attribute_type, attribute_code). Dealers with few sales borrow strength from coarser geographies — neighborhood, then country — proportionally to sample size. The pattern replicates attribute_stats_smart from CDP.

Shrinkage blend

For each (dealer_id, attribute_type, attribute_code) cell and each declared field:

raw_weight_i = eval(field.weight_expr on layer_i stats)
weight_i = raw_weight_i / divisor_i
if weight_i > min_weight AND field value IS NOT NULL
= 0 otherwise

blended = Σ(weight_i × value_i) / Σ(weight_i) [null when Σ = 0]

The weight_expr is a Polars-SQL expression — typically the group observation count or the attribute-level count (group_nobs * take_rate). The divisor normalises weights so coarser layers (with more data) contribute equally only when the dealer layer has too few observations.

Layers

Layers are ordered finest → coarsest. Layer 0 defines the base frame: every dealer in the assembled joins dimension frame appears in the output for every (attribute_type, attribute_code) pair observed in that layer. Coarser layers contribute weight=0 for dealers they have no data for.

Layerdivisorjoin_onMeaning
dealer1[dealer_id]Dealer's own sales — full weight per observation
neighborhood15[neighborhood]~15 dealers per neighborhood on average
country600[country]All dealers in market — ~600× more data

The joins list assembles the dimension frame mapping dealer_id to coarser geographies. The component joins coarser-layer stats onto that frame to fan them out to dealer granularity, so stat assets never need to carry a dealer_id column.

Fields

Each BlendFieldSpec declares one column to blend and the SQL expression that drives its weight. Only declared fields appear in the output; z-score columns ({label}_{col}_z) are intentionally excluded — z-scores from different geographic layers have incompatible denominators and cannot be meaningfully combined.

fields:
- name: sales_take_rate
weight_expr: sales_group_nobs

- name: sales_avg_days_on_lot
weight_expr: "sales_group_nobs * sales_take_rate"

Full YAML example

type: ai_core.components.GeographicSmootherComponent
attributes:
oem: mb
name: demand_take_rates
start_date: "2025-01-01"
joins:
- asset: mb/consolidated/dealers
join_on: [dealer_id]
fields: [country]
- asset: mb/dealers/neighborhoods
join_on: [dealer_id]
fields: [neighborhood]
min_weight: 5
layers:
- name: dealer
divisor: 1
join_on: [dealer_id]
asset: mb/attribute_stats/country_by_dealer
- name: neighborhood
divisor: 15
join_on: [neighborhood]
asset: mb/attribute_stats/country_by_neighborhood
- name: country
divisor: 600
join_on: [country]
asset: mb/attribute_stats/country_by_country
fields:
- name: sales_take_rate
weight_expr: sales_group_nobs
- name: sales_avg_days_on_lot
weight_expr: "sales_group_nobs * sales_take_rate"

Output schema: [dealer_id, attribute_type, attribute_code, <declared fields>, _partition_date].