Days on Lot

Predicts how long a vehicle will sit on the lot before selling. Two assets ship together under one Dagster group:

<oem>/days_on_lot_model — trained sklearn Pipeline, refit per partition (MLModelComponent).
<oem>/days_on_lot_predictions — per-vehicle scored output (car_id, predicted_days_on_lot, _partition_date) (ModelInferenceComponent).

Both components share the same feature assembly logic from ai_core.ml, so training and inference see identical inputs.

Training (`MLModelComponent`)

Joins enriched inventory + reference tables + named inventory-statistics tables into one feature matrix, then fits the configured estimator. KPI columns from each StatsInputSpec are prefixed with {name}__ and referenced from feature_columns via source: <name>.

type: ai_core.components.MLModelComponent
attributes:
  oem: mb
  name: days_on_lot_model
  start_date: "2025-01-01"
  group: days_on_lot
  task_type: regression
  model_type: xgboost
  label_column: days_on_lot
  inventory: mb/inventory_enriched
  joins:
    - asset: mb/dealers/neighborhoods
      join_on: [dealer_id]
      fields: [neighborhood]
  stats_lookback_days: 365
  stats_inputs:
    - name: stats_by_dealer_neighborhood
      asset: mb/inventory_stats/dealer_neighborhood
      join_on: [dealer_id, neighborhood]
  feature_columns:
    - { column: days_supply_90D,     source: stats_by_dealer_neighborhood }
    - { column: avg_days_on_lot_90D, source: stats_by_dealer_neighborhood }
    - { column: neighborhood,        source: dealers }
  hyperparameters:
    n_estimators: 15
    max_depth: 14
    learning_rate: 0.5

Field	Description
`group`	Dagster UI group — share with the inference component so model + predictions cluster together
`model_type`	`linear`, `xgboost`, or `random_forest` (`ModelType` StrEnum)
`task_type`	`regression` or `classification`
`label_column`	Prediction target column
`feature_columns`	Per-column `(column, source)` pairs — `source` is `inventory`, `dealers`, `models`, or a `stats_inputs` name
`stats_inputs`	Statistics tables joined as features. Non-key columns get `{name}__` prefix
`stats_lookback_days`	Past partitions of stats to load. `> 0` enables point-in-time joins keyed on `arrival_date` (set to max vehicle age)

Point-in-time stats joins

When stats_lookback_days > 0 and arrival_date is present, multi-partition stats are joined per-vehicle by the latest stats row with _partition_date <= arrival_date. This avoids leaking future market state into training data. Labelled rows missing arrival_date are dropped and reported by the training_data_quality asset check.

Output asset

The training asset is one row per partition with the pickled Pipeline, feature column metadata, and train/val R²:

model_type, task_type, label_column, feature_columns_json,
training_row_count, train_r2, val_r2, serialized_model, _partition_date

Inference (`ModelInferenceComponent`)

Loads the latest pickled Pipeline from the training asset, assembles the same feature matrix using the same joins and stats_inputs as training, scores every active vehicle, and writes (car_id, predicted_<label>, _partition_date).

type: ai_core.components.ModelInferenceComponent
attributes:
  oem: mb
  name: days_on_lot_predictions
  start_date: "2025-01-01"
  group: days_on_lot
  model_asset: mb/days_on_lot_model
  inventory: mb/inventory_enriched
  joins:
    - asset: mb/dealers/neighborhoods
      join_on: [dealer_id]
      fields: [neighborhood]
  stats_inputs:
    - name: stats_by_dealer_neighborhood
      asset: mb/inventory_stats/dealer_neighborhood
      join_on: [dealer_id, neighborhood]

joins and stats_inputs must mirror the training config exactly — the model expects the same feature columns. feature_columns and label_column are read from the model artifact, so they don't need repeating.

Downstream usage

The predicted_days_on_lot column flows back into Take Rates via a joins entry, where derived_fields compute abnormal_days_on_lot = days_on_lot - predicted_days_on_lot. Take-rate aggregations then surface attributes whose vehicles consistently sell faster or slower than the model predicts.

Training (MLModelComponent)​

Point-in-time stats joins​

Output asset​

Inference (ModelInferenceComponent)​

Downstream usage​

Training (`MLModelComponent`)

Point-in-time stats joins

Output asset

Inference (`ModelInferenceComponent`)

Downstream usage