Skip to main content

Dagster Execution Model

How the ECS executor works

In production (any Dagster Cloud deployment), each Dagster job run involves two distinct kinds of ECS tasks with very different roles.

Run task — the coordinator

When you trigger a job from the Dagster UI or a schedule fires, Dagster launches a run task. This task is a thin coordinator: it loads the Dagster definitions, initializes the executor, and then spends the rest of its lifetime making ECS API calls — launching step tasks, polling their status, and reporting results back to the control plane.

The run task does no data processing. It never touches Iceberg, never calls an OEM API, never loads a Polars DataFrame.

Step tasks — the workers

For each step in the job's DAG, the executor launches a separate step task in ECS. Steps that have no dependency between them run concurrently. Each step task runs one Dagster op or asset materialization and exits.

This is where all the real work happens: HTTP fetches, Polars transforms, Iceberg reads and writes.

ECS: run task (coordinator)
└── spawns ──▶ ECS: step "audi/raw/scs_search"
└── spawns ──▶ ECS: step "audi/raw/pss_dealers"
└── spawns ──▶ ECS: step "audi/transformed/scs_inventory" (after raw completes)
└── spawns ──▶ ECS: step "audi/consolidated/inventory" (after transformed completes)
...

Resource sizing

Because the run task and step tasks have completely different workloads, they should be sized differently.

Run task — size it small

The run task's memory footprint is just Python startup cost: importing Dagster, loading the OEM module, building the Definitions object. After that it is essentially idle, waiting on DescribeTasks poll responses.

512 CPU / 2048 MB is sufficient. Going below 2 GB risks OOM during the Python import phase (Dagster + Polars + PyIceberg + all OEM modules), but there is no benefit to going above it.

Step tasks — size for the actual workload

Step tasks do the heavy lifting. The consolidated and transformed steps in particular load full Polars DataFrames in memory. 4096 CPU / 16384 MB (4 vCPU / 16 GB) is the current baseline, matching what the run task was previously sized to before the ECS executor was introduced.

Configuration

container_context.yaml

Each OEM project has a container_context.yaml that Dagster Cloud merges into every ECS task it launches for that code location:

ecs:
run_resources:
# Coordinator only — step tasks are sized separately
cpu: "512"
memory: "2048"
env_vars:
- DAGSTER_EXECUTOR=dagster_aws.ecs.ecs_executor

run_resources controls the run task (coordinator). It does not affect step task sizing.

DAGSTER_EXECUTOR is read by executor_from_env() in ai_core/core_defs.py at definition-load time. When unset (local development), the in-process executor is used instead — steps run sequentially inside a single process, no ECS tasks are spawned.

Configuring step task resources

Step task CPU and memory are set through the ECS executor's run config. The two practical options are:

Per-asset tags — apply resource settings to individual heavy assets in Python:

@dg.asset(
tags={"ecs/cpu": "4096", "ecs/memory": "16384"},
)
def consolidated_inventory(...):
...

Executor config in a schedule — override resources for all steps in a scheduled run:

@dg.schedule(cron_schedule="0 6 * * *", job=my_job)
def daily_run(_context):
return dg.RunRequest(
run_config={
"execution": {
"config": {
"cpu": 4096,
"memory": 16384,
}
}
}
)

If neither is set, step tasks fall back to the task definition's registered baseline.

Local development

Locally, DAGSTER_EXECUTOR is not set, so executor_from_env() returns in_process_executor. All steps run inside the single dg dev process — no ECS tasks are spawned, no AWS credentials are needed for execution. The run_resources settings in container_context.yaml have no effect locally.

The downside of in-process execution is that a failing step crashes the entire run. In production, a failing step task exits and Dagster retries or surfaces the error while leaving other concurrent steps unaffected.