Skip to main content

Days on Lot

Predicts how long a vehicle will sit on the lot before selling. Two assets ship together under one Dagster group:

  • <oem>/days_on_lot_model — trained sklearn Pipeline, refit per partition (MLModelComponent).
  • <oem>/days_on_lot_predictions — per-vehicle scored output (car_id, predicted_days_on_lot, _partition_date) (ModelInferenceComponent).

Both components share the same feature assembly logic from ai_core.ml, so training and inference see identical inputs.

Training (MLModelComponent)

Joins enriched inventory + reference tables + named inventory-statistics tables into one feature matrix, then fits the configured estimator. KPI columns from each StatsInputSpec are prefixed with {name}__ and referenced from feature_columns via source: <name>.

type: ai_core.components.MLModelComponent
attributes:
oem: mb
name: days_on_lot_model
start_date: "2025-01-01"
group: days_on_lot
task_type: regression
model_type: xgboost
label_column: days_on_lot
inventory: mb/inventory_enriched
joins:
- asset: mb/dealers/neighborhoods
join_on: [dealer_id]
fields: [neighborhood]
stats_lookback_days: 365
stats_inputs:
- name: stats_by_dealer_neighborhood
asset: mb/inventory_stats/dealer_neighborhood
join_on: [dealer_id, neighborhood]
feature_columns:
- { column: days_supply_90D, source: stats_by_dealer_neighborhood }
- { column: avg_days_on_lot_90D, source: stats_by_dealer_neighborhood }
- { column: neighborhood, source: dealers }
hyperparameters:
n_estimators: 15
max_depth: 14
learning_rate: 0.5
FieldDescription
groupDagster UI group — share with the inference component so model + predictions cluster together
model_typelinear, xgboost, or random_forest (ModelType StrEnum)
task_typeregression or classification
label_columnPrediction target column
feature_columnsPer-column (column, source) pairs — source is inventory, dealers, models, or a stats_inputs name
stats_inputsStatistics tables joined as features. Non-key columns get {name}__ prefix
stats_lookback_daysPast partitions of stats to load. > 0 enables point-in-time joins keyed on arrival_date (set to max vehicle age)

Point-in-time stats joins

When stats_lookback_days > 0 and arrival_date is present, multi-partition stats are joined per-vehicle by the latest stats row with _partition_date <= arrival_date. This avoids leaking future market state into training data. Labelled rows missing arrival_date are dropped and reported by the training_data_quality asset check.

Output asset

The training asset is one row per partition with the pickled Pipeline, feature column metadata, and train/val R²:

model_type, task_type, label_column, feature_columns_json,
training_row_count, train_r2, val_r2, serialized_model, _partition_date

Inference (ModelInferenceComponent)

Loads the latest pickled Pipeline from the training asset, assembles the same feature matrix using the same joins and stats_inputs as training, scores every active vehicle, and writes (car_id, predicted_<label>, _partition_date).

type: ai_core.components.ModelInferenceComponent
attributes:
oem: mb
name: days_on_lot_predictions
start_date: "2025-01-01"
group: days_on_lot
model_asset: mb/days_on_lot_model
inventory: mb/inventory_enriched
joins:
- asset: mb/dealers/neighborhoods
join_on: [dealer_id]
fields: [neighborhood]
stats_inputs:
- name: stats_by_dealer_neighborhood
asset: mb/inventory_stats/dealer_neighborhood
join_on: [dealer_id, neighborhood]

joins and stats_inputs must mirror the training config exactly — the model expects the same feature columns. feature_columns and label_column are read from the model artifact, so they don't need repeating.

Downstream usage

The predicted_days_on_lot column flows back into Take Rates via a joins entry, where derived_fields compute abnormal_days_on_lot = days_on_lot - predicted_days_on_lot. Take-rate aggregations then surface attributes whose vehicles consistently sell faster or slower than the model predicts.