Skip to main content

Vehicle Features

Classifies vehicle options into structured attribute types and assembles a normalized per-vehicle asset that drives take-rate aggregations.

Feature Enrichment

Classifies vehicle options and packages into structured attribute types (PAINT_COLOR, OPTION_SOUND_SYSTEM, etc.) so downstream aggregations can group by attribute code without parsing raw option strings. The component reads consolidated features and emits a <oem>/feature_enriched asset with a vehicle_attributes column of type List(Struct({attribute_type, attribute_code})).

Attribute specs

FeaturizationComponent ships with built-in specs for common attribute types (paint colors, fabric colors, wheel categories, option and package categories). OEM-specific specs are added via additional_attributes:

type: ai_core.components.FeaturizationComponent
attributes:
oem: mb
name: feature_enriched
start_date: "2025-01-01"
additional_attributes:
- feature_type: OPTION
attribute_name: OPTION_SOUND_SYSTEM
field: name
regexes:
- value: UPGRADED
regex: "(burmester )"
FieldDescription
feature_typeBroad category — maps to the attribute_type column in the output
attribute_nameSpecific attribute subtype (e.g. OPTION_SOUND_SYSTEM)
fieldFeature field to match the regex against (e.g. name, code)
regexesList of {value, regex} pairs; the first matching regex sets attribute_code to value

Output schema

[car_id, vehicle_attributes, _partition_date]

vehicle_attributes is a native Polars List(Struct) — each element carries attribute_type and attribute_code. This is the format expected by Take Rates and Vehicle Featurization.

Vehicle Featurization

Joins enriched inventory, feature classifications, and dealer dimensions into a normalized per-vehicle asset. This is the primary input to Take Rates: a strictly structured frame with one row per car carrying the vehicle_attributes list plus the join keys needed for grouping.

The output lands at <oem>/featurized_by_country.

Configuration

type: ai_core.components.VehicleEnrichmentComponent
attributes:
oem: mb
start_date: "2025-01-01"
combination_attributes:
- [PAINT_COLOR, FABRIC_COLOR]
- [FABRIC_COLOR, TRIM_CATEGORY]
- [PAINT_COLOR, FABRIC_MATERIAL]
FieldDescription
combination_attributesAttribute type pairs to combine into a synthetic composite attribute. Each pair produces an extra element in vehicle_attributes whose attribute_code is <code_A>__<code_B>

Inputs

The component always reads three upstream assets for the same partition:

  • <oem>/inventory_enriched — enriched inventory (provides car_id, model_id, dealer_id, and metric columns)
  • <oem>/feature_enriched — classified options (provides vehicle_attributes)
  • <oem>/consolidated/dealers — dealer dimensions (provides country)
  • <oem>/inventory_stats/country — country-level statistics joined onto each vehicle for use by Attribute Statistics

Output schema

[car_id, model_id, dealer_id, vehicle_attributes, _partition_date]

The schema is intentionally narrow. All filter, weight, and metric columns (e.g. is_sold_in_window, hl90_weight, days_on_lot) are kept in <oem>/inventory_enriched and joined onto this asset by Take Rates via its inventory input — not embedded here.