Skip to main content

PPM Data Platform

Multi-OEM vehicle data platform built with Dagster. Shared logic lives in packages/ai_core; each OEM gets its own Dagster project under projects/.

  • Getting Started — Installation, running Dagster locally, materializing assets, monitoring runs, running tests
  • Dagster Execution Model — How the ECS executor works, run task vs step tasks, resource sizing, container_context.yaml configuration
  • Monitoring — Sentry alerting for asset check failures and run crashes
  • Debugging ECS Containers — Using ECS Exec / SSM to shell into Fargate containers (Nessie, code locations)

Ingestion

  • Design — Three-tier pipeline, asset key conventions, checks, partitioning
  • Code Structure — Package layout, component classes, YAML config, adding a new OEM
  • Component Reference — Advanced component attributes, raw tier schema, utilities
  • ConsolidatedComponent — Source groups, gap fill, merge spec, foreign keys
  • Operations — Running Dagster, materializing assets, monitoring runs

Derived

Downstream of consolidated ingestion, the platform produces a graph of OEM-specific assets that turn raw entity records into KPIs, predictions, and smoothed dealer-level take rates. Each asset is partitioned by day and emitted by a reusable component class.

Asset keys always start with <oem>/ and use the two-segment <oem>/<name> pattern. Dealer classifications land under <oem>/dealers/<name> (e.g. mb/dealers/neighborhoods). Every asset uses DailyPartitionsDefinition with end_offset=1, matching the consolidated tier.

  • Enrichment — Spatial dealer clustering and computed sold-date, days-on-lot, half-life weights on consolidated inventory
  • Vehicle Features — Option classification into structured attribute types and per-vehicle feature assembly
  • Inventory Statistics — Current inventory count, rolling sales counts, days-supply, average days-on-lot per grouping
  • Days on Lot — Trains a regression model and scores every active vehicle with predicted days on lot
  • Take Rates — Per-attribute take rates and metric KPIs, blended across geographic layers to dealer granularity

Warehouse

  • Nessie — Iceberg catalog: deployment, VPC vs public connectivity, Auth0 authentication, utilities