Skip to main content

Scaffolding a New OEM

This guide covers everything required to add a new OEM to the platform. The platform runs on a Dagster hybrid ECS deployment — code locations run as Fargate tasks in our AWS VPC, not on Dagster's managed serverless infrastructure. Adding a new OEM therefore requires both Python scaffolding and AWS infrastructure changes, all of which must land in the same PR.

Overview

StepWhat it does
1. Copier scaffoldGenerates the Python project skeleton under projects/ai_<oem>/
2. Dagster workspaceRegisters the project in dg.toml and dagster_cloud.yaml
3. S3 bucketIceberg warehouse for the OEM's data
4. Nessie configTells the Nessie catalog server where the warehouse is
5. ECR repositoryContainer registry for the OEM's Docker image
6. DockerfileContainer image built from the repo root
7. CI workflowParallel Docker build job + Dagster deploy registration
8. CI test matrixAdds the project to the pytest matrix

The /scaffold-oem Claude skill automates steps 1–2 and walks through 3–8 interactively. To scaffold a new OEM, ask Claude:

"Scaffold a new OEM project for BMW"


Step 1 — Copier scaffold

Run from the repo root:

uvx copier copy \
.claude/skills/scaffold-oem/template \
projects/ai_<oem> \
--data oem_name=<oem> \
--data oem_display_name="<Display Name>" \
--data start_date=<YYYY-MM-DD> \
--defaults

The template generates:

  • pyproject.toml — dependencies, mypy config, uv.sources path reference to ai_core
  • definitions.py — calls load_from_defs_folder to auto-discover YAML components
  • components/raw.py — skeleton for RawSourceComponent subclasses
  • sources/ — empty package for pure-Python API client modules
  • defs/raw/defs.yaml, defs/transformed/defs.yaml, defs/consolidated/defs.yaml — placeholder pipeline components
  • tests/test_<oem>.py — YAML validation test that calls load_from_defs_folder
  • build.yaml — ECR registry pointer (999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>)
  • Dockerfile — basic image; see Step 6 if the project uses cosy-encryption

Sync the venv and verify the scaffold loads:

uv sync --directory projects/ai_<oem>

uv run --directory projects/ai_<oem> python -c "
from ai_<oem>.definitions import defs
keys = [str(k) for a in defs.assets for k in (getattr(a, 'keys', None) or [a.key])]
print('\n'.join(sorted(keys)))
"

Expected output includes <oem>/raw/vehicles, <oem>/transformed/vehicles, <oem>/consolidated/vehicles.


Step 2 — Register in dg.toml and dagster_cloud.yaml

dg.toml — append:

[[workspace.projects]]
path = "projects/ai_<oem>"

dagster_cloud.yaml — append to the locations list:

- location_name: ai_<oem>
code_source:
package_name: ai_<oem>
build:
directory: ./projects/ai_<oem>

Step 3 — S3 Iceberg warehouse

In deployments/aws/terraform/solutions/dagster-agent/nessie_s3.tf, add one entry to the warehouse_buckets local. The for_each resources (versioning, encryption, public-access-block) and the Nessie task-role IAM policy all expand automatically:

locals {
warehouse_buckets = {
audi = "ai-app-audi-iceberg-prod"
mercedes = "ai-app-mercedes-iceberg-prod"
stellantis = "ai-app-stellantis-iceberg-prod"
<oem> = "ai-app-<oem>-iceberg-prod" # add this line
}
}

Also add the new bucket to the UserCodeExecutionRole inline policy in deployments/aws/cloudformation/ecs-agent-vpc-private.yaml. This role is assumed by Dagster user-code ECS tasks; without it, asset materializations will fail with an S3 AccessDenied error. Add both the object-level and bucket-level ARNs:

              - Effect: Allow
Action:
- s3:GetObject
- s3:PutObject
- s3:DeleteObject
Resource:
- "arn:aws:s3:::ai-app-<oem>-iceberg-prod/*"
# ... existing entries ...
- Effect: Allow
Action:
- s3:ListBucket
- s3:GetBucketLocation
Resource:
- "arn:aws:s3:::ai-app-<oem>-iceberg-prod"
# ... existing entries ...

Step 4 — Nessie warehouse env var

In deployments/aws/config/nessie/prod.json, add one entry under env_vars:

"NESSIE_CATALOG_WAREHOUSES_<OEM_UPPER>_LOCATION": "s3://ai-app-<oem>-iceberg-prod"

where <OEM_UPPER> is the OEM name uppercased (e.g. BMW). This is injected into the Nessie ECS task definition at Terraform apply time; Nessie gets a new task revision automatically.


Step 5 — ECR repository

In deployments/aws/terraform/solutions/dagster-agent/ecr.tf, add:

resource "aws_ecr_repository" "ai_<oem>" {
name = "ai-<oem>"
image_tag_mutability = "MUTABLE"

image_scanning_configuration {
scan_on_push = true
}
}

resource "aws_ecr_lifecycle_policy" "ai_<oem>" {
repository = aws_ecr_repository.ai_<oem>.name

policy = jsonencode({
rules = [{
rulePriority = 1
description = "Keep last 20 images"
selection = {
tagStatus = "any"
countType = "imageCountMoreThan"
countNumber = 20
}
action = { type = "expire" }
}]
})
}

Also add an output to outputs.tf:

output "ecr_repository_ai_<oem>" {
description = "ECR repository URL for the ai-<oem> code location."
value = aws_ecr_repository.ai_<oem>.repository_url
}

Deploy the dagster-agent Terraform solution after this change to create the repository before the first CI build tries to push to it.


Step 6 — Dockerfile

The Copier template generates a basic Dockerfile. Ensure ENV PATH is set so the dagster binary installed by uv sync is on the ECS container's PATH:

FROM python:3.13-slim

WORKDIR /app

COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

COPY packages/ai_core packages/ai_core
COPY projects/ai_<oem> projects/ai_<oem>

WORKDIR /app/projects/ai_<oem>
RUN uv sync --frozen --no-dev

ENV PATH="/app/projects/ai_<oem>/.venv/bin:$PATH"

EXPOSE 4000

The build context in CI is the repo root so packages/ai_core is reachable. Never build from inside the project directory.

Projects that use cosy-encryption (Nexus private index)

Replace the RUN step with BuildKit secrets to avoid embedding credentials in image layers:

RUN --mount=type=secret,id=uv_nexus_username \
--mount=type=secret,id=uv_nexus_password \
UV_INDEX_NEXUS_USERNAME=$(cat /run/secrets/uv_nexus_username) \
UV_INDEX_NEXUS_PASSWORD=$(cat /run/secrets/uv_nexus_password) \
uv sync --frozen --no-dev

ENV PATH="/app/projects/ai_<oem>/.venv/bin:$PATH"

Step 7 — CI workflow

The CI workflow (.github/workflows/dagster-plus-hybrid.yml) builds one Docker image per OEM as parallel jobs and then registers them all with Dagster Cloud in a single finalize job. Three changes are required for a new OEM.

7a. New build job

Copy build_ai_audi (no Nexus) or build_ai_stellantis (Nexus / cosy-encryption) and replace all audi/ai_audi or stellantis/ai_stellantis references:

# OEM image builds run in parallel. When adding a new OEM:
# 1. Add a new build_ai_<oem> job below.
# 2. Add the new job name to the `needs` list in the finalize job.

build_ai_<oem>:
name: Build ai_<oem>
needs: [setup]
runs-on: ubuntu-22.04
if: needs.setup.outputs.prerun_result != 'skip'
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ github.head_ref }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_AI_PROD_KEY }}
aws-secret-access-key: ${{ secrets.AWS_AI_PROD_SECRET }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push
run: |
docker build . \
-f projects/ai_<oem>/Dockerfile \
--platform linux/amd64 \
-t 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}
docker push 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}

For Nexus projects, add --secret flags and env: to the build step:

    - name: Build and push
run: |
docker build . \
-f projects/ai_<oem>/Dockerfile \
--secret id=uv_nexus_username,env=NEXUS_USERNAME \
--secret id=uv_nexus_password,env=NEXUS_PASSWORD \
--platform linux/amd64 \
-t 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}
docker push 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}
env:
NEXUS_USERNAME: ${{ secrets.NEXUS_USERNAME }}
NEXUS_PASSWORD: ${{ secrets.NEXUS_PASSWORD }}

7b. Register build output in finalize

Add a set-build-output step inside the finalize job, immediately before the Deploy to Dagster Cloud step:

      - name: Register build output ai_<oem>
if: needs.setup.outputs.prerun_result != 'skip'
uses: dagster-io/dagster-cloud-action/actions/utils/dg-cli@v1.12.21
with:
command: >-
plus deploy set-build-output
--location-name=ai_<oem>
--image-tag=${{ needs.setup.outputs.image_tag }}

7c. Update finalize.needs

Add the new build job to the needs list:

finalize:
needs: [setup, build_ai_audi, build_ai_mercedes, build_ai_stellantis, build_ai_<oem>]

Step 8 — CI test matrix

In .github/workflows/ci.yml, add the new project:

- path: projects/ai_<oem>

After the PR merges

  1. Run the Deploy Dagster Infrastructure workflow (dagster-agent workspace) to apply the Terraform changes — this creates the S3 bucket, ECR repository, and updates the Nessie task definition.
  2. The next hybrid deploy CI run will build and push the new OEM image and register it with Dagster Cloud.

Implementing data sources

Once the scaffold is merged, implement the OEM's data sources:

  1. Add source modules in sources/ — pure Python, no Dagster imports
  2. Implement RawSourceComponent subclasses in components/raw.py (see ai_audi for reference)
  3. Replace the placeholder YAML with real datasets pointing at the new subclasses
  4. Write source tests under tests/sources/

See Code Structure for the full implementation guide.