Nessie
Project Nessie is the Iceberg REST catalog for the platform. All three pipeline tiers — raw, transformed, and consolidated — write to Iceberg tables registered in Nessie. S3 holds the data files; Nessie holds the catalog metadata and provides atomic branching and merging of table state so branch deployments can run without touching production data.
Deployment
Nessie runs as a Fargate service in the AWS VPC (account 999655274916, region us-east-1), registered via Cloud Map for service discovery within the VPC. The public endpoint is fronted by an ALB and secured by Auth0.
Connectivity
How you connect depends on where the code runs:
| Context | NESSIE_URI | Authentication |
|---|---|---|
| ECS code-location tasks (inside VPC) | http://nessie.ppm.internal:19120 | None — VPC-internal, no auth required |
| Local development | https://nessie.app.autointel.ai/ | Auth0 client credentials |
| CI / branch deployments | https://nessie.app.autointel.ai/ | Auth0 client credentials |
ECS tasks that connect via the VPC-internal address do not set NESSIE_AUTH — the endpoint is only reachable inside the private network.
Auth0 authentication
The public endpoint requires Auth0 client credentials. Set NESSIE_AUTH=auth0 and supply the credentials in your .env:
NESSIE_AUTH=auth0
NESSIE_AUTH0_DOMAIN=<auth0-tenant>.auth0.com
NESSIE_AUTH0_CLIENT_ID=<m2m-client-id>
NESSIE_AUTH0_CLIENT_SECRET=<m2m-client-secret>
NESSIE_AUTH0_API_AUDIENCE=<api-audience>
Get these values from a team member — the same credentials used in CI work for local read access. Tokens are fetched and cached automatically and refresh before they expire.
How the platform uses it
All Iceberg IO routes through build_iceberg_io_manager in ai_core.io_managers.iceberg. When NESSIE_URI is set, the IO manager connects to the Nessie REST catalog in place of the local SQLite fallback. The active catalog branch is derived from the Dagster deployment name in branch deployments and falls back to main otherwise.
Set NESSIE_READ_ONLY=1 alongside NESSIE_URI to read from Nessie while routing any write outputs to the local catalog — useful for testing transformed or consolidated assets locally against production raw data without writing back to production.
Utilities
packages/ai_core/src/ai_core/nessie.py provides lightweight helpers for working with Nessie outside of the Dagster IO manager:
nessie_catalog_properties— builds the pyiceberg catalog property dict for a given URI, warehouse, and branch; includes the auth block whenauthis suppliedload_nessie_catalog— constructs and returns a pyicebergCatalogfrom env vars; used when code needs to interact with Iceberg tables directlynessie_client/NessieClient— thin wrapper around the Nessie REST API v2 for branch management (create, delete, merge); auth is resolved from theNESSIE_AUTHenv vars automatically
Environment variables
| Variable | Purpose |
|---|---|
NESSIE_URI | Nessie base URI. Selects the Nessie catalog over the local SQLite fallback when set. |
NESSIE_READ_ONLY | Set to 1 to read from Nessie but write outputs to the local catalog. |
NESSIE_AUTH | Auth strategy. Set to auth0 for the public endpoint; leave unset inside the VPC. |
NESSIE_AUTH0_DOMAIN | Auth0 tenant domain (e.g. example.auth0.com). Required when NESSIE_AUTH=auth0. |
NESSIE_AUTH0_CLIENT_ID | Auth0 M2M application client ID. Required when NESSIE_AUTH=auth0. |
NESSIE_AUTH0_CLIENT_SECRET | Auth0 M2M application client secret. Required when NESSIE_AUTH=auth0. |
NESSIE_AUTH0_API_AUDIENCE | Auth0 API audience (aud claim). Required when NESSIE_AUTH=auth0. |