Days on Lot
Predicts how long a vehicle will sit on the lot before selling. Two assets ship together under one Dagster group:
<oem>/days_on_lot_model— trained sklearnPipeline, refit per partition (MLModelComponent).<oem>/days_on_lot_predictions— per-vehicle scored output(car_id, predicted_days_on_lot, _partition_date)(ModelInferenceComponent).
Both components share the same feature assembly logic from ai_core.ml, so training and inference see identical inputs.
Training (MLModelComponent)
Joins enriched inventory + reference tables + named inventory-statistics tables into one feature matrix, then fits the configured estimator. KPI columns from each StatsInputSpec are prefixed with {name}__ and referenced from feature_columns via source: <name>.
type: ai_core.components.MLModelComponent
attributes:
oem: mb
name: days_on_lot_model
start_date: "2025-01-01"
group: days_on_lot
task_type: regression
model_type: xgboost
label_column: days_on_lot
inventory: mb/inventory_enriched
joins:
- asset: mb/dealers/neighborhoods
join_on: [dealer_id]
fields: [neighborhood]
stats_lookback_days: 365
stats_inputs:
- name: stats_by_dealer_neighborhood
asset: mb/inventory_stats/dealer_neighborhood
join_on: [dealer_id, neighborhood]
feature_columns:
- { column: days_supply_90D, source: stats_by_dealer_neighborhood }
- { column: avg_days_on_lot_90D, source: stats_by_dealer_neighborhood }
- { column: neighborhood, source: dealers }
hyperparameters:
n_estimators: 15
max_depth: 14
learning_rate: 0.5
| Field | Description |
|---|---|
group | Dagster UI group — share with the inference component so model + predictions cluster together |
model_type | linear, xgboost, or random_forest (ModelType StrEnum) |
task_type | regression or classification |
label_column | Prediction target column |
feature_columns | Per-column (column, source) pairs — source is inventory, dealers, models, or a stats_inputs name |
stats_inputs | Statistics tables joined as features. Non-key columns get {name}__ prefix |
stats_lookback_days | Past partitions of stats to load. > 0 enables point-in-time joins keyed on arrival_date (set to max vehicle age) |
Point-in-time stats joins
When stats_lookback_days > 0 and arrival_date is present, multi-partition stats are joined per-vehicle by the latest stats row with _partition_date <= arrival_date. This avoids leaking future market state into training data. Labelled rows missing arrival_date are dropped and reported by the training_data_quality asset check.
Output asset
The training asset is one row per partition with the pickled Pipeline, feature column metadata, and train/val R²:
model_type, task_type, label_column, feature_columns_json,
training_row_count, train_r2, val_r2, serialized_model, _partition_date
Inference (ModelInferenceComponent)
Loads the latest pickled Pipeline from the training asset, assembles the same feature matrix using the same joins and stats_inputs as training, scores every active vehicle, and writes (car_id, predicted_<label>, _partition_date).
type: ai_core.components.ModelInferenceComponent
attributes:
oem: mb
name: days_on_lot_predictions
start_date: "2025-01-01"
group: days_on_lot
model_asset: mb/days_on_lot_model
inventory: mb/inventory_enriched
joins:
- asset: mb/dealers/neighborhoods
join_on: [dealer_id]
fields: [neighborhood]
stats_inputs:
- name: stats_by_dealer_neighborhood
asset: mb/inventory_stats/dealer_neighborhood
join_on: [dealer_id, neighborhood]
joins and stats_inputs must mirror the training config exactly — the model expects the same feature columns. feature_columns and label_column are read from the model artifact, so they don't need repeating.
Downstream usage
The predicted_days_on_lot column flows back into Take Rates via a joins entry, where derived_fields compute abnormal_days_on_lot = days_on_lot - predicted_days_on_lot. Take-rate aggregations then surface attributes whose vehicles consistently sell faster or slower than the model predicts.