Take Rates
Computes per-attribute take rates and KPIs from featurized inventory, then blends them across geographic layers to produce smooth dealer-level estimates.
Attribute Statistics
Produces per-partition take rates and metric KPIs for vehicle attribute codes (paint colors, options, packages, …), grouped by configurable dimensions like dealer or carline.
The component is a pure aggregation engine: it does no date arithmetic or window logic of its own. All filtering and weighting columns must be pre-computed by Inventory Enrichment and flow through the vehicle_featurized input.
Inputs
vehicle_featurized— One row per car with avehicle_attributescolumn of nativeList(Struct({attribute_type, attribute_code})).inventory— Enriched inventory; supplies filter, weight, and metric columns plus any per-vehicle dimension columns referenced by aggregations.joins— Reference tables joined onto each vehicle to expose dimension columns used by aggregations (e.g. dealer → country, model → carline).derived_fields— Polars-SQL expressions evaluated after inventory and joins land. Use these to combine columns coming from different sources (for example,abnormal_days_on_lot = days_on_lot - predicted_days_on_lot).
Aggregations
An aggregation defines the GROUP BY dimensions for one output asset. Each aggregation produces a separate Dagster asset.
aggregations:
- key: country_by_carline
group_by: [carline_id, model_catalog_id]
- key: country_by_dealer
group_by: [dealer_id]
- key: country_by_dealer_carline
group_by: [dealer_id, carline_id]
| Field | Description |
|---|---|
key | Leaf segment of the asset key. Lands at <oem>/attribute_stats/<key> |
group_by | Columns to group over. Must be present on the joined frame |
Statistics
A statistic defines one vehicle sub-population and what to measure. Multiple statistics are computed together and stored as columns in the same aggregation asset.
statistics:
- label: sales
filter_column: is_sold_in_window
metric_columns: [days_on_lot]
- label: sales_hl90
filter_column: is_sold_in_window
weight_column: hl90_weight
metric_columns: [days_on_lot]
- label: inventory
filter_column: is_in_inventory
metric_columns: [days_on_lot]
| Field | Required | Description |
|---|---|---|
label | Yes | Column prefix for output columns (e.g. sales, sales_hl90) |
filter_column | No | Boolean column. Only rows where it is true are included. Missing column → empty result, no error |
weight_column | No | Float weight applied uniformly to take rate and all metric aggregations. Missing column falls back to 1.0 |
metric_columns | No | Numeric columns to compute weighted avg, diff, and z for. Missing columns are silently skipped |
A single weight_column applies to both the take rate and the metric aggregations, so all output columns for a given statistic reflect the same "perspective" on the market. For two HL variants, define two statistics.
Output columns
Each asset contains the grouping columns, then attribute_type / attribute_code, then a block of {label}_* columns for every statistic:
| Column | Description |
|---|---|
{label}_take_rate | Share of group weight carried by vehicles with this attribute |
{label}_group_nobs | Total weight (or count) in the group — the take-rate denominator |
{label}_avg_{col} | Weighted mean of metric column for vehicles carrying this attribute |
{label}_{col}_diff | Attribute mean minus group mean (negative → faster sales) |
{label}_{col}_z | Diff divided by group weighted std. Null when std is zero or null |
Each asset carries one take_rate_sums_{label} asset check per statistic.
Weighting scheme
When weight_column is set:
- Take rate numerator:
sum(weight)over vehicles with the attribute. - Take rate denominator:
sum(weight)over all vehicles in the group. - Weighted metric mean:
sum(weight × metric) / sum(weight where metric not null). - Weighted variance:
sum(w × x²) / sum(w) − (sum(w × x) / sum(w))².
Geographic Smoothing
GeographicSmootherComponent produces a single dealer-level asset that blends multiple Attribute Statistics layers into one smoothed estimate per (dealer_id, attribute_type, attribute_code). Dealers with few sales borrow strength from coarser geographies — neighborhood, then country — proportionally to sample size. The pattern replicates attribute_stats_smart from CDP.
Shrinkage blend
For each (dealer_id, attribute_type, attribute_code) cell and each declared field:
raw_weight_i = eval(field.weight_expr on layer_i stats)
weight_i = raw_weight_i / divisor_i
if weight_i > min_weight AND field value IS NOT NULL
= 0 otherwise
blended = Σ(weight_i × value_i) / Σ(weight_i) [null when Σ = 0]
The weight_expr is a Polars-SQL expression — typically the group observation count or the attribute-level count (group_nobs * take_rate). The divisor normalises weights so coarser layers (with more data) contribute equally only when the dealer layer has too few observations.
Layers
Layers are ordered finest → coarsest. Layer 0 defines the base frame: every dealer in the assembled joins dimension frame appears in the output for every (attribute_type, attribute_code) pair observed in that layer. Coarser layers contribute weight=0 for dealers they have no data for.
| Layer | divisor | join_on | Meaning |
|---|---|---|---|
dealer | 1 | [dealer_id] | Dealer's own sales — full weight per observation |
neighborhood | 15 | [neighborhood] | ~15 dealers per neighborhood on average |
country | 600 | [country] | All dealers in market — ~600× more data |
The joins list assembles the dimension frame mapping dealer_id to coarser geographies. The component joins coarser-layer stats onto that frame to fan them out to dealer granularity, so stat assets never need to carry a dealer_id column.
Fields
Each BlendFieldSpec declares one column to blend and the SQL expression that drives its weight. Only declared fields appear in the output; z-score columns ({label}_{col}_z) are intentionally excluded — z-scores from different geographic layers have incompatible denominators and cannot be meaningfully combined.
fields:
- name: sales_take_rate
weight_expr: sales_group_nobs
- name: sales_avg_days_on_lot
weight_expr: "sales_group_nobs * sales_take_rate"
Full YAML example
type: ai_core.components.GeographicSmootherComponent
attributes:
oem: mb
name: demand_take_rates
start_date: "2025-01-01"
joins:
- asset: mb/consolidated/dealers
join_on: [dealer_id]
fields: [country]
- asset: mb/dealers/neighborhoods
join_on: [dealer_id]
fields: [neighborhood]
min_weight: 5
layers:
- name: dealer
divisor: 1
join_on: [dealer_id]
asset: mb/attribute_stats/country_by_dealer
- name: neighborhood
divisor: 15
join_on: [neighborhood]
asset: mb/attribute_stats/country_by_neighborhood
- name: country
divisor: 600
join_on: [country]
asset: mb/attribute_stats/country_by_country
fields:
- name: sales_take_rate
weight_expr: sales_group_nobs
- name: sales_avg_days_on_lot
weight_expr: "sales_group_nobs * sales_take_rate"
Output schema: [dealer_id, attribute_type, attribute_code, <declared fields>, _partition_date].