Scaffolding a New OEM
This guide covers everything required to add a new OEM to the platform. The platform runs on a Dagster hybrid ECS deployment — code locations run as Fargate tasks in our AWS VPC, not on Dagster's managed serverless infrastructure. Adding a new OEM therefore requires both Python scaffolding and AWS infrastructure changes, all of which must land in the same PR.
Overview
| Step | What it does |
|---|---|
| 1. Copier scaffold | Generates the Python project skeleton under projects/ai_<oem>/ |
| 2. Dagster workspace | Registers the project in dg.toml and dagster_cloud.yaml |
| 3. S3 bucket | Iceberg warehouse for the OEM's data |
| 4. Nessie config | Tells the Nessie catalog server where the warehouse is |
| 5. ECR repository | Container registry for the OEM's Docker image |
| 6. Dockerfile | Container image built from the repo root |
| 7. CI workflow | Parallel Docker build job + Dagster deploy registration |
| 8. CI test matrix | Adds the project to the pytest matrix |
The /scaffold-oem Claude skill automates steps 1–2 and walks through 3–8 interactively.
To scaffold a new OEM, ask Claude:
"Scaffold a new OEM project for BMW"
Step 1 — Copier scaffold
Run from the repo root:
uvx copier copy \
.claude/skills/scaffold-oem/template \
projects/ai_<oem> \
--data oem_name=<oem> \
--data oem_display_name="<Display Name>" \
--data start_date=<YYYY-MM-DD> \
--defaults
The template generates:
pyproject.toml— dependencies, mypy config,uv.sourcespath reference toai_coredefinitions.py— callsload_from_defs_folderto auto-discover YAML componentscomponents/raw.py— skeleton forRawSourceComponentsubclassessources/— empty package for pure-Python API client modulesdefs/raw/defs.yaml,defs/transformed/defs.yaml,defs/consolidated/defs.yaml— placeholder pipeline componentstests/test_<oem>.py— YAML validation test that callsload_from_defs_folderbuild.yaml— ECR registry pointer (999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>)Dockerfile— basic image; see Step 6 if the project usescosy-encryption
Sync the venv and verify the scaffold loads:
uv sync --directory projects/ai_<oem>
uv run --directory projects/ai_<oem> python -c "
from ai_<oem>.definitions import defs
keys = [str(k) for a in defs.assets for k in (getattr(a, 'keys', None) or [a.key])]
print('\n'.join(sorted(keys)))
"
Expected output includes <oem>/raw/vehicles, <oem>/transformed/vehicles, <oem>/consolidated/vehicles.
Step 2 — Register in dg.toml and dagster_cloud.yaml
dg.toml — append:
[[workspace.projects]]
path = "projects/ai_<oem>"
dagster_cloud.yaml — append to the locations list:
- location_name: ai_<oem>
code_source:
package_name: ai_<oem>
build:
directory: ./projects/ai_<oem>
Step 3 — S3 Iceberg warehouse
In deployments/aws/terraform/solutions/dagster-agent/nessie_s3.tf, add one entry to
the warehouse_buckets local. The for_each resources (versioning, encryption,
public-access-block) and the Nessie task-role IAM policy all expand automatically:
locals {
warehouse_buckets = {
audi = "ai-app-audi-iceberg-prod"
mercedes = "ai-app-mercedes-iceberg-prod"
stellantis = "ai-app-stellantis-iceberg-prod"
<oem> = "ai-app-<oem>-iceberg-prod" # add this line
}
}
Also add the new bucket to the UserCodeExecutionRole inline policy in
deployments/aws/cloudformation/ecs-agent-vpc-private.yaml. This role is assumed by
Dagster user-code ECS tasks; without it, asset materializations will fail with an S3
AccessDenied error. Add both the object-level and bucket-level ARNs:
- Effect: Allow
Action:
- s3:GetObject
- s3:PutObject
- s3:DeleteObject
Resource:
- "arn:aws:s3:::ai-app-<oem>-iceberg-prod/*"
# ... existing entries ...
- Effect: Allow
Action:
- s3:ListBucket
- s3:GetBucketLocation
Resource:
- "arn:aws:s3:::ai-app-<oem>-iceberg-prod"
# ... existing entries ...
Step 4 — Nessie warehouse env var
In deployments/aws/config/nessie/prod.json, add one entry under env_vars:
"NESSIE_CATALOG_WAREHOUSES_<OEM_UPPER>_LOCATION": "s3://ai-app-<oem>-iceberg-prod"
where <OEM_UPPER> is the OEM name uppercased (e.g. BMW). This is injected into
the Nessie ECS task definition at Terraform apply time; Nessie gets a new task revision
automatically.
Step 5 — ECR repository
In deployments/aws/terraform/solutions/dagster-agent/ecr.tf, add:
resource "aws_ecr_repository" "ai_<oem>" {
name = "ai-<oem>"
image_tag_mutability = "MUTABLE"
image_scanning_configuration {
scan_on_push = true
}
}
resource "aws_ecr_lifecycle_policy" "ai_<oem>" {
repository = aws_ecr_repository.ai_<oem>.name
policy = jsonencode({
rules = [{
rulePriority = 1
description = "Keep last 20 images"
selection = {
tagStatus = "any"
countType = "imageCountMoreThan"
countNumber = 20
}
action = { type = "expire" }
}]
})
}
Also add an output to outputs.tf:
output "ecr_repository_ai_<oem>" {
description = "ECR repository URL for the ai-<oem> code location."
value = aws_ecr_repository.ai_<oem>.repository_url
}
Deploy the dagster-agent Terraform solution after this change to create the repository
before the first CI build tries to push to it.
Step 6 — Dockerfile
The Copier template generates a basic Dockerfile. Ensure ENV PATH is set so the
dagster binary installed by uv sync is on the ECS container's PATH:
FROM python:3.13-slim
WORKDIR /app
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
COPY packages/ai_core packages/ai_core
COPY projects/ai_<oem> projects/ai_<oem>
WORKDIR /app/projects/ai_<oem>
RUN uv sync --frozen --no-dev
ENV PATH="/app/projects/ai_<oem>/.venv/bin:$PATH"
EXPOSE 4000
The build context in CI is the repo root so packages/ai_core is reachable. Never
build from inside the project directory.
Projects that use cosy-encryption (Nexus private index)
Replace the RUN step with BuildKit secrets to avoid embedding credentials in image layers:
RUN --mount=type=secret,id=uv_nexus_username \
--mount=type=secret,id=uv_nexus_password \
UV_INDEX_NEXUS_USERNAME=$(cat /run/secrets/uv_nexus_username) \
UV_INDEX_NEXUS_PASSWORD=$(cat /run/secrets/uv_nexus_password) \
uv sync --frozen --no-dev
ENV PATH="/app/projects/ai_<oem>/.venv/bin:$PATH"
Step 7 — CI workflow
The CI workflow (.github/workflows/dagster-plus-hybrid.yml) builds one Docker image
per OEM as parallel jobs and then registers them all with Dagster Cloud in a single
finalize job. Three changes are required for a new OEM.
7a. New build job
Copy build_ai_audi (no Nexus) or build_ai_stellantis (Nexus / cosy-encryption)
and replace all audi/ai_audi or stellantis/ai_stellantis references:
# OEM image builds run in parallel. When adding a new OEM:
# 1. Add a new build_ai_<oem> job below.
# 2. Add the new job name to the `needs` list in the finalize job.
build_ai_<oem>:
name: Build ai_<oem>
needs: [setup]
runs-on: ubuntu-22.04
if: needs.setup.outputs.prerun_result != 'skip'
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ github.head_ref }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_AI_PROD_KEY }}
aws-secret-access-key: ${{ secrets.AWS_AI_PROD_SECRET }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push
run: |
docker build . \
-f projects/ai_<oem>/Dockerfile \
--platform linux/amd64 \
-t 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}
docker push 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}
For Nexus projects, add --secret flags and env: to the build step:
- name: Build and push
run: |
docker build . \
-f projects/ai_<oem>/Dockerfile \
--secret id=uv_nexus_username,env=NEXUS_USERNAME \
--secret id=uv_nexus_password,env=NEXUS_PASSWORD \
--platform linux/amd64 \
-t 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}
docker push 999655274916.dkr.ecr.us-east-1.amazonaws.com/ai-<oem>:${{ needs.setup.outputs.image_tag }}
env:
NEXUS_USERNAME: ${{ secrets.NEXUS_USERNAME }}
NEXUS_PASSWORD: ${{ secrets.NEXUS_PASSWORD }}
7b. Register build output in finalize
Add a set-build-output step inside the finalize job, immediately before the
Deploy to Dagster Cloud step:
- name: Register build output ai_<oem>
if: needs.setup.outputs.prerun_result != 'skip'
uses: dagster-io/dagster-cloud-action/actions/utils/dg-cli@v1.12.21
with:
command: >-
plus deploy set-build-output
--location-name=ai_<oem>
--image-tag=${{ needs.setup.outputs.image_tag }}
7c. Update finalize.needs
Add the new build job to the needs list:
finalize:
needs: [setup, build_ai_audi, build_ai_mercedes, build_ai_stellantis, build_ai_<oem>]
Step 8 — CI test matrix
In .github/workflows/ci.yml, add the new project:
- path: projects/ai_<oem>
After the PR merges
- Run the
Deploy Dagster Infrastructureworkflow (dagster-agentworkspace) to apply the Terraform changes — this creates the S3 bucket, ECR repository, and updates the Nessie task definition. - The next hybrid deploy CI run will build and push the new OEM image and register it with Dagster Cloud.
Implementing data sources
Once the scaffold is merged, implement the OEM's data sources:
- Add source modules in
sources/— pure Python, no Dagster imports - Implement
RawSourceComponentsubclasses incomponents/raw.py(seeai_audifor reference) - Replace the placeholder YAML with real datasets pointing at the new subclasses
- Write source tests under
tests/sources/
See Code Structure for the full implementation guide.