Vehicle Features
Classifies vehicle options into structured attribute types and assembles a normalized per-vehicle asset that drives take-rate aggregations.
Feature Enrichment
Classifies vehicle options and packages into structured attribute types (PAINT_COLOR, OPTION_SOUND_SYSTEM, etc.) so downstream aggregations can group by attribute code without parsing raw option strings. The component reads consolidated features and emits a <oem>/feature_enriched asset with a vehicle_attributes column of type List(Struct({attribute_type, attribute_code})).
Attribute specs
FeaturizationComponent ships with built-in specs for common attribute types (paint colors, fabric colors, wheel categories, option and package categories). OEM-specific specs are added via additional_attributes:
type: ai_core.components.FeaturizationComponent
attributes:
oem: mb
name: feature_enriched
start_date: "2025-01-01"
additional_attributes:
- feature_type: OPTION
attribute_name: OPTION_SOUND_SYSTEM
field: name
regexes:
- value: UPGRADED
regex: "(burmester )"
| Field | Description |
|---|---|
feature_type | Broad category — maps to the attribute_type column in the output |
attribute_name | Specific attribute subtype (e.g. OPTION_SOUND_SYSTEM) |
field | Feature field to match the regex against (e.g. name, code) |
regexes | List of {value, regex} pairs; the first matching regex sets attribute_code to value |
Output schema
[car_id, vehicle_attributes, _partition_date]
vehicle_attributes is a native Polars List(Struct) — each element carries attribute_type and attribute_code. This is the format expected by Take Rates and Vehicle Featurization.
Vehicle Featurization
Joins enriched inventory, feature classifications, and dealer dimensions into a normalized per-vehicle asset. This is the primary input to Take Rates: a strictly structured frame with one row per car carrying the vehicle_attributes list plus the join keys needed for grouping.
The output lands at <oem>/featurized_by_country.
Configuration
type: ai_core.components.VehicleEnrichmentComponent
attributes:
oem: mb
start_date: "2025-01-01"
combination_attributes:
- [PAINT_COLOR, FABRIC_COLOR]
- [FABRIC_COLOR, TRIM_CATEGORY]
- [PAINT_COLOR, FABRIC_MATERIAL]
| Field | Description |
|---|---|
combination_attributes | Attribute type pairs to combine into a synthetic composite attribute. Each pair produces an extra element in vehicle_attributes whose attribute_code is <code_A>__<code_B> |
Inputs
The component always reads three upstream assets for the same partition:
<oem>/inventory_enriched— enriched inventory (providescar_id,model_id,dealer_id, and metric columns)<oem>/feature_enriched— classified options (providesvehicle_attributes)<oem>/consolidated/dealers— dealer dimensions (providescountry)<oem>/inventory_stats/country— country-level statistics joined onto each vehicle for use by Attribute Statistics
Output schema
[car_id, model_id, dealer_id, vehicle_attributes, _partition_date]
The schema is intentionally narrow. All filter, weight, and metric columns (e.g. is_sold_in_window, hl90_weight, days_on_lot) are kept in <oem>/inventory_enriched and joined onto this asset by Take Rates via its inventory input — not embedded here.