DealerConnect Source
The dealerconnect source reads pre-parsed PDF blob data from BigQuery tables populated by
the ppm-ordering-services collection pipeline. It is specific to the Stellantis project
(projects/ai_stellantis).
BigQuery Tables
| Table pattern | Description | Partition |
|---|---|---|
inventory_{dealerId} | In-stock vehicles with POC blobs | DAY (fetch_date) |
sales_{dealerId} | Sold vehicles with POC blobs | None (filtered by fetch_date + sale_date) |
code_guides | Trim/equipment spec blobs | None (filtered by fetch_date + blob_id dedup) |
Dealer tables are discovered dynamically at runtime by listing all tables in the configured
dataset that match the relevant prefix (inventory_ or sales_). There is no static dealer ID
list — new dealers are picked up automatically as they are onboarded to the collection pipeline.
Required Environment Variables
| Variable | Description | Example |
|---|---|---|
STELLANTIS_GCP_PROJECT | GCP project ID containing the DealerConnect dataset (defaults to ai-app-stellantis) | my-gcp-project |
GCP_CREDENTIALS | Service account JSON key content (production only; omit to use ADC locally) | {"type":"service_account",...} |
The BigQuery dataset name defaults to dealerconnect and is set as a component YAML attribute
(dataset:) on each DealerConnectRawSourceComponent subclass — it is not an environment variable.
Authentication
In local development, authenticate with Application Default Credentials (ADC):
gcloud auth application-default login
In production, set GCP_CREDENTIALS to the JSON content of a service account key (not a
file path). The BigQueryResource registered in definitions.py reads this env var via
gcp_credentials=dg.EnvVar("GCP_CREDENTIALS").
The service account requires the following BigQuery IAM roles on the dataset:
roles/bigquery.dataViewer— read table dataroles/bigquery.metadataViewer— list tables (required for dealer discovery)roles/bigquery.jobUser— run queries
Dagster Resource Key
The BigQuery client is injected via the standard BIG_QUERY_RESOURCE_KEY resource (value:
"big_query_resource") exported from ai_core.components.raw_historical. All
DealerConnectRawSourceComponent subclasses declare this key in _required_resource_keys() and
obtain the client via context.resources[BIG_QUERY_RESOURCE_KEY].get_client().
Raw Asset Contract
Each raw asset row has the standard RAW_ROW_SCHEMA columns. The _response_body column
contains a JSON-serialized BQ row dict with all original columns from the source table,
including poc_raw_blob (for inventory/sales) or pdf_raw_blob (for code guides), which may
be null if the collection pipeline failed to parse that PDF.
Operational Notes
- NULL blobs are retained in the raw asset so collection failures are auditable. The transform-tier extractors skip NULL blobs gracefully and log a warning per skip.
- Missing dealer tables are warned and skipped — a partition can materialize successfully even if some dealers have no data for that date.
- Zero tables found is logged at info level and results in an empty DataFrame — this is valid for early backfill dates before any dealers were onboarded.