Debugging ECS Containers with ECS Exec
ECS Exec uses AWS Systems Manager (SSM) to open an interactive shell session inside a running Fargate container. This is the primary way to debug issues on infrastructure containers like Nessie or OEM code-location tasks.
Prerequisites
Install the Session Manager plugin for the AWS CLI:
# macOS
brew install --cask session-manager-plugin
# Verify
session-manager-plugin --version
You also need AWS CLI v2 configured with credentials that have permission to call
ecs:ExecuteCommand on the target cluster.
Connecting to Nessie
1. Find the running task
aws ecs list-tasks \
--cluster dagster-hybrid-agent-AgentCluster \
--service-name nessie \
--region us-east-1 \
--query 'taskArns[0]' \
--output text
This returns a task ARN like arn:aws:ecs:us-east-1:999655274916:task/dagster-hybrid-agent-AgentCluster/abc123.
2. Start an interactive session
aws ecs execute-command \
--cluster dagster-hybrid-agent-AgentCluster \
--task <task-id> \
--container nessie \
--interactive \
--command "/bin/sh" \
--region us-east-1
Replace <task-id> with the full task ARN or just the ID portion (abc123).
3. Common debugging commands
Once inside the container:
# Check Nessie health
curl -s http://localhost:9000/q/health/ready | python3 -m json.tool
# Check Nessie API version
curl -s http://localhost:19120/api/v2/config
# List Iceberg namespaces
curl -s http://localhost:19120/iceberg/v1/namespaces
# Check environment variables (redacts secrets)
env | grep -i nessie | sort
# Check JVM memory usage
cat /proc/1/status | grep -i vm
# Check disk usage
df -h
# View recent logs (if not using awslogs exclusively)
ls /tmp/
Connecting to OEM Code-Location Containers
OEM code-location containers are launched by the Dagster agent as separate ECS tasks. They run in the same cluster but as distinct services.
Find the service name
aws ecs list-services \
--cluster dagster-hybrid-agent-AgentCluster \
--region us-east-1 \
--query 'serviceArns[*]' \
--output table
Connect
# List tasks for the code location
aws ecs list-tasks \
--cluster dagster-hybrid-agent-AgentCluster \
--service-name <service-name> \
--region us-east-1
# Connect
aws ecs execute-command \
--cluster dagster-hybrid-agent-AgentCluster \
--task <task-id> \
--container <container-name> \
--interactive \
--command "/bin/sh" \
--region us-east-1
ECS Exec must be enabled on the ECS service for the target container. The Nessie
service has this enabled via Terraform (enable_execute_command = true). Dagster
agent-managed code-location services may not have it enabled by default — check the
CloudFormation template or Dagster agent configuration.
Troubleshooting
"The execute command failed"
The SSM agent inside the container needs outbound HTTPS (port 443) to reach the SSM endpoints. Nessie tasks already allow this. If a different container fails, check its security group egress rules.
"TargetNotConnectedException"
The SSM agent hasn't started yet or the task is still initializing. Wait 30-60 seconds
after the task enters RUNNING state and retry.
"An error occurred (InvalidParameterException)"
Verify that enable_execute_command is true on the ECS service. For Nessie this is
set in deployments/aws/terraform/solutions/dagster-agent/nessie_ecs.tf. Updating
this setting requires a new deployment of the service (Terraform apply triggers a
service update, which rolls out a new task with the SSM agent sidecar).
IAM permissions
The task role (not the execution role) must have ssmmessages:* permissions. For
Nessie, these are granted by the nessie-ssm-exec policy in
deployments/aws/terraform/solutions/dagster-agent/nessie_iam.tf.
Required actions:
ssmmessages:CreateControlChannel
ssmmessages:CreateDataChannel
ssmmessages:OpenControlChannel
ssmmessages:OpenDataChannel
The caller (your IAM user/role) needs:
ecs:ExecuteCommand
ecs:DescribeTasks