Scaffolding a New OEM

This guide covers everything required to add a new OEM to the platform. The platform runs on a Dagster hybrid ECS deployment — code locations run as Fargate tasks in our AWS VPC, not on Dagster's managed serverless infrastructure. Adding a new OEM therefore requires both Python scaffolding and AWS infrastructure changes, all of which must land in the same PR.

Overview

Step	What it does
1. Copier scaffold	Generates the Python project skeleton under `projects/ai_<oem>/`
2. Dagster workspace	Registers the project in `dg.toml` and `dagster_cloud.yaml`
3. S3 bucket	Iceberg warehouse for the OEM's data
4. Nessie config	Tells the Nessie catalog server where the warehouse is
5. ECR repository	Container registry for the OEM's Docker image
6. Dockerfile	Container image built from the repo root
7. CI workflow	Parallel Docker build job + Dagster deploy registration
8. CI test matrix	Adds the project to the pytest matrix

The /scaffold-oem Claude skill automates steps 1–2 and walks through 3–8 interactively. To scaffold a new OEM, ask Claude:

"Scaffold a new OEM project for BMW"

Step 1 — Copier scaffold

Run from the repo root:

uvx copier copy \
  .claude/skills/scaffold-oem/template \
  projects/ai_<oem> \
  --data oem_name=<oem> \
  --data oem_display_name="<Display Name>" \
  --data start_date=<YYYY-MM-DD> \
  --defaults

The template generates:

pyproject.toml — dependencies, mypy config, uv.sources path reference to ai_core
definitions.py — calls load_from_defs_folder to auto-discover YAML components
components/raw.py — skeleton for RawSourceComponent subclasses
sources/ — empty package for pure-Python API client modules
defs/raw/defs.yaml, defs/transformed/defs.yaml, defs/consolidated/defs.yaml — placeholder pipeline components
tests/test_<oem>.py — YAML validation test that calls load_from_defs_folder
build.yaml — ECR registry pointer (999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>)
Dockerfile — basic image; see Step 6 if the project uses cosy-encryption

Sync the venv and verify the scaffold loads:

uv sync --directory projects/ai_<oem>

uv run --directory projects/ai_<oem> python -c "
from ai_<oem>.definitions import defs
keys = [str(k) for a in defs.assets for k in (getattr(a, 'keys', None) or [a.key])]
print('\n'.join(sorted(keys)))
"

Expected output includes <oem>/raw/vehicles, <oem>/transformed/vehicles, <oem>/consolidated/vehicles.

Step 2 — Register in `dg.toml` and `dagster_cloud.yaml`

dg.toml — append:

[[workspace.projects]]
path = "projects/ai_<oem>"

dagster_cloud.yaml — append to the locations list:

- location_name: ai_<oem>
  code_source:
    package_name: ai_<oem>
  build:
    directory: ./projects/ai_<oem>

Step 3 — S3 Iceberg warehouse

In deployments/aws/terraform/solutions/dagster-agent/nessie_s3.tf, add one entry to the warehouse_buckets local. The for_each resources (versioning, encryption, public-access-block) and the Nessie task-role IAM policy all expand automatically:

locals {
  warehouse_buckets = {
    audi       = "ai-app-audi-iceberg-prod"
    mercedes   = "ai-app-mercedes-iceberg-prod"
    stellantis = "ai-app-stellantis-iceberg-prod"
    <oem>      = "ai-app-<oem>-iceberg-prod"   # add this line
  }
}

Also add the new bucket to the UserCodeExecutionRole inline policy in deployments/aws/cloudformation/ecs-agent-vpc-private.yaml. This role is assumed by Dagster user-code ECS tasks; without it, asset materializations will fail with an S3 AccessDenied error. Add both the object-level and bucket-level ARNs:

              - Effect: Allow
                Action:
                  - s3:GetObject
                  - s3:PutObject
                  - s3:DeleteObject
                Resource:
                  - "arn:aws:s3:::ai-app-<oem>-iceberg-prod/*"
                  # ... existing entries ...
              - Effect: Allow
                Action:
                  - s3:ListBucket
                  - s3:GetBucketLocation
                Resource:
                  - "arn:aws:s3:::ai-app-<oem>-iceberg-prod"
                  # ... existing entries ...

Step 4 — Nessie warehouse env var

In deployments/aws/config/nessie/prod.json, add one entry under env_vars:

"NESSIE_CATALOG_WAREHOUSES_<OEM_UPPER>_LOCATION": "s3://ai-app-<oem>-iceberg-prod"

where <OEM_UPPER> is the OEM name uppercased (e.g. BMW). This is injected into the Nessie ECS task definition at Terraform apply time; Nessie gets a new task revision automatically.

Step 5 — ECR repository

In deployments/aws/terraform/solutions/dagster-agent/ecr.tf, add:

resource "aws_ecr_repository" "ai_<oem>" {
  name                 = "ai-<oem>"
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }
}

resource "aws_ecr_lifecycle_policy" "ai_<oem>" {
  repository = aws_ecr_repository.ai_<oem>.name

  policy = jsonencode({
    rules = [{
      rulePriority = 1
      description  = "Keep last 20 images"
      selection = {
        tagStatus   = "any"
        countType   = "imageCountMoreThan"
        countNumber = 20
      }
      action = { type = "expire" }
    }]
  })
}

Also add an output to outputs.tf:

output "ecr_repository_ai_<oem>" {
  description = "ECR repository URL for the ai-<oem> code location."
  value       = aws_ecr_repository.ai_<oem>.repository_url
}

Deploy the dagster-agent Terraform solution after this change to create the repository before the first CI build tries to push to it.

Step 6 — Dockerfile

The Copier template generates a basic Dockerfile. Ensure ENV PATH is set so the dagster binary installed by uv sync is on the ECS container's PATH:

FROM python:3.13-slim

WORKDIR /app

COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

COPY packages/ai_core packages/ai_core
COPY projects/ai_<oem> projects/ai_<oem>

WORKDIR /app/projects/ai_<oem>
RUN uv sync --frozen --no-dev

ENV PATH="/app/projects/ai_<oem>/.venv/bin:$PATH"

EXPOSE 4000

The build context in CI is the repo root so packages/ai_core is reachable. Never build from inside the project directory.

Projects that use `cosy-encryption` (Nexus private index)

Replace the RUN step with BuildKit secrets to avoid embedding credentials in image layers:

RUN --mount=type=secret,id=uv_nexus_username \
    --mount=type=secret,id=uv_nexus_password \
    UV_INDEX_NEXUS_USERNAME=$(cat /run/secrets/uv_nexus_username) \
    UV_INDEX_NEXUS_PASSWORD=$(cat /run/secrets/uv_nexus_password) \
    uv sync --frozen --no-dev

ENV PATH="/app/projects/ai_<oem>/.venv/bin:$PATH"

Step 7 — CI workflow

The CI workflow (.github/workflows/dagster-plus-hybrid.yml) builds one Docker image per OEM as parallel jobs and then registers them all with Dagster Cloud in a single finalize job. Three changes are required for a new OEM.

7a. New build job

Copy build_ai_audi (no Nexus) or build_ai_stellantis (Nexus / cosy-encryption) and replace all audi/ai_audi or stellantis/ai_stellantis references:

# OEM image builds run in parallel. When adding a new OEM:
#   1. Add a new build_ai_<oem> job below.
#   2. Add the new job name to the `needs` list in the finalize job.

build_ai_<oem>:
  name: Build ai_<oem>
  needs: [setup]
  runs-on: ubuntu-22.04
  if: needs.setup.outputs.prerun_result != 'skip'
  steps:
    - name: Checkout
      uses: actions/checkout@v4
      with:
        ref: ${{ github.head_ref }}
    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_AI_PROD_KEY }}
        aws-secret-access-key: ${{ secrets.AWS_AI_PROD_SECRET }}
        aws-region: ${{ env.AWS_REGION }}
    - name: Login to Amazon ECR
      uses: aws-actions/amazon-ecr-login@v2
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v3
    - name: Build and push
      run: |
        docker build . \
          -f projects/ai_<oem>/Dockerfile \
          --platform linux/amd64 \
          -t 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}
        docker push 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}

For Nexus projects, add --secret flags and env: to the build step:

    - name: Build and push
      run: |
        docker build . \
          -f projects/ai_<oem>/Dockerfile \
          --secret id=uv_nexus_username,env=NEXUS_USERNAME \
          --secret id=uv_nexus_password,env=NEXUS_PASSWORD \
          --platform linux/amd64 \
          -t 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}
        docker push 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}
      env:
        NEXUS_USERNAME: ${{ secrets.NEXUS_USERNAME }}
        NEXUS_PASSWORD: ${{ secrets.NEXUS_PASSWORD }}

7b. Register build output in `finalize`

Add a set-build-output step inside the finalize job, immediately before the Deploy to Dagster Cloud step:

      - name: Register build output ai_<oem>
        if: needs.setup.outputs.prerun_result != 'skip'
        uses: dagster-io/dagster-cloud-action/actions/utils/dg-cli@v1.12.21
        with:
          command: >-
            plus deploy set-build-output
            --location-name=ai_<oem>
            --image-tag=${{ needs.setup.outputs.image_tag }}

7c. Update `finalize.needs`

Add the new build job to the needs list:

finalize:
  needs: [setup, build_ai_audi, build_ai_mercedes, build_ai_stellantis, build_ai_<oem>]

Step 8 — CI test matrix

In .github/workflows/ci.yml, add the new project:

- path: projects/ai_<oem>

After the PR merges

Run the Deploy Dagster Infrastructure workflow (dagster-agent workspace) to apply the Terraform changes — this creates the S3 bucket, ECR repository, and updates the Nessie task definition.
The next hybrid deploy CI run will build and push the new OEM image and register it with Dagster Cloud.

Implementing data sources

Once the scaffold is merged, implement the OEM's data sources:

Add source modules in sources/ — pure Python, no Dagster imports
Implement RawSourceComponent subclasses in components/raw.py (see ai_audi for reference)
Replace the placeholder YAML with real datasets pointing at the new subclasses
Write source tests under tests/sources/

See Code Structure for the full implementation guide.

Overview​

Step 1 — Copier scaffold​

Step 2 — Register in dg.toml and dagster_cloud.yaml​

Step 3 — S3 Iceberg warehouse​

Step 4 — Nessie warehouse env var​

Step 5 — ECR repository​

Step 6 — Dockerfile​

Projects that use cosy-encryption (Nexus private index)​

Step 7 — CI workflow​

7a. New build job​

7b. Register build output in finalize​

7c. Update finalize.needs​

Step 8 — CI test matrix​

After the PR merges​

Implementing data sources​