Ingestion Operations
Local Configuration
Environment variables are loaded automatically from projects/ai_<oem>/.env when
dg dev starts. Copy the relevant .env file and fill in the values for your
machine before running locally.
Required variables
| Variable | Purpose |
|---|---|
RAW_STORAGE_URI | fsspec-compatible URI where raw JSON assets are written (e.g. temp/raw). |
ICEBERG_CATALOG_URI | SQLite connection string for the local Iceberg catalog (e.g. sqlite:////absolute/path/temp/iceberg/catalog.db). |
ICEBERG_WAREHOUSE | Local warehouse root for Iceberg data files (e.g. file:///absolute/path/temp/iceberg/warehouse). Note: three slashes for file:// URIs on absolute paths. |
Optional variables
| Variable | Purpose |
|---|---|
ICEBERG_CATALOG_TYPE | Catalog backend type. Defaults to sql. Override in deployed environments (e.g. for Postgres). |
ICEBERG_CATALOG_NAME | Name of the catalog entry. Defaults to <oem>_local (e.g. audi_local). Set to the OEM name in deployed environments. |
ICEBERG_INIT_DIRS | Set to 1 on first setup or after wiping temp/iceberg/ to auto-create the local SQLite and warehouse directories at startup. Leave unset day-to-day. |
DUCKDB_PATH | Path to a DuckDB file for the transformed tier. Omit to use Dagster's default filesystem IO manager. |
PPM_SOURCE_OVERRIDES | Path to a YAML file that injects data into raw or transformed assets instead of calling the live API. See below. |
NESSIE_URI | Nessie REST catalog base URI. When set, the IO manager connects to Nessie instead of the local SQLite catalog. Used in deployed environments; leave unset for local development. |
NESSIE_READ_ONLY | Set to 1 to read from Nessie but write outputs to the local SQLite/file catalog. Useful for testing transformed or consolidated assets against production raw data locally. |
For NESSIE_AUTH and Auth0 credentials, see Nessie.
First-time local setup
Create the Iceberg directories and initialize the catalog namespace once before running for the first time (or after a reset):
./scripts/reset_iceberg_local.sh --init # create dirs without wiping data
Then start dg dev with ICEBERG_INIT_DIRS=1 set for the first startup
only — Dagster will create the catalog namespace automatically:
ICEBERG_INIT_DIRS=1 deployments/local/.venv/bin/dg dev
Subsequent starts do not need the flag.
Resetting local Iceberg state
To wipe the local catalog and all stored data and start fresh:
./scripts/reset_iceberg_local.sh
Then restart dg dev with ICEBERG_INIT_DIRS=1 as above.
Source overrides
The source override mechanism lets you inject data from BigQuery tables or local JSON files into any raw source or transformed entity asset, without modifying source module code. This is useful for local testing when live API access is unavailable or when you want to replay a known-good dataset.
Create a YAML file (e.g. temp/source_overrides.yaml) and point
PPM_SOURCE_OVERRIDES at it:
overrides:
# Read from a BigQuery partitioned table.
# partition_date defaults to the asset's partition key if omitted.
- oem: audi
source: onegraph
resource: carlines
type: bigquery
table: "project.dataset.table"
# partition_date: "2026-01-27"
# Read from a local JSON file previously written by RawJsonIOManager.
# {partition} is interpolated with the asset's partition key.
- oem: audi
source: pss
resource: dealers
type: local_file
path: "temp/raw/audi/raw/pss_dealers/{partition}.json"
# Override a transformed entity asset — bypasses the extractor function.
# The `entity` field distinguishes this from a raw source override.
# Use partition_range to serve the same file for a range of dates.
- oem: audi
source: onegraph
resource: carlines
entity: models
type: local_file
path: "/absolute/path/to/models_2026-01-27.json"
partition_range: ["2026-01-27", "2026-01-31"]
Override types:
bigquery— queriesSELECT entry FROM \table` WHERE DATE(_PARTITIONTIME) = '{partition}'. Each row'sentrycolumn must be a JSON object; all rows are collected into a list. Thepartition_date` field pins a specific date when you need to replay a partition that differs from the one being materialized.local_file— reads a JSON file at the given path.\{partition\}in the path is replaced with the asset's partition key.
Additional fields:
entity— when present, the override applies to the transformed entity asset instead of the raw source asset. OnlyTransformedEntityComponentchecks for overrides with anentityfield.partition_range— a two-element list[start, end](inclusive). The override only applies to partitions within this range. Multiple overrides for the same source with non-overlapping ranges are allowed. Entries withoutpartition_rangematch any partition.
A full local launch command looks like (env vars are loaded from .env automatically):
macOS / Linux
cd projects/ai_audi
PPM_SOURCE_OVERRIDES="$(pwd)/../../temp/source_overrides.yaml" \
.venv/bin/dg launch \
--assets "audi/raw/onegraph_carlines,audi/transformed/models_from_onegraph_carlines,audi/consolidated/models" \
--partition 2026-01-27
Windows (PowerShell)
cd projects/ai_audi
$env:PPM_SOURCE_OVERRIDES = "$PWD/../../temp/source_overrides.yaml"
.venv\Scripts\dg.exe launch `
--assets "audi/raw/onegraph_carlines,audi/transformed/models_from_onegraph_carlines,audi/consolidated/models" `
--partition 2026-01-27
Viewing Asset Check Results
Asset checks run automatically after each materialization. To view results, open the asset in the Dagster UI and click the Checks tab.