Merge Spec
The merge block controls how records are matched across sources and how field values are resolved when sources disagree.
merge:
match_on:
- [vin] # tier 1 — tried first; a row with a valid VIN is claimed here
- [von] # tier 2 — fallback for rows where VIN is empty/null
coalesce:
use_latest: [price] # prefer the value from the source with the most recent last_seen
evict_stale: [offer] # ignore a source's value if its last_seen is not today
match_on
An ordered list of field sets used to match right-side rows against the accumulated left side. Each right row is tried against tiers in order; the first tier where all its fields are non-empty on both rows claims the row. A claimed row is never retried on a lower tier.
Left (accumulated) wins on shared columns unless coalesce overrides the policy for specific fields.
merge:
match_on:
- [vin] # tier 1: rows with a non-empty VIN are matched here
- [von] # tier 2: rows without a VIN fall through to VON matching
coalesce
Field-level overrides for how values are selected across sources. By default every field is resolved by source priority order alone.
use_latest
Selects the value from the source with the most recent last_seen date. If two sources share the same last_seen, priority order breaks the tie.
coalesce:
use_latest: [price, dealer_code, estimated_ship_date]
Use this for fields that should always reflect the freshest observation regardless of source priority — for example, a price that may have been updated in a lower-priority source after the higher-priority source last reported it.
evict_stale
Ignores a source's value for this field if that source's last_seen is not the current partition date — effectively dropping stale sources from consideration before priority order is applied.
coalesce:
evict_stale: [offer_status]
Use this for fields that are only meaningful when fresh. If all sources are stale for a given field, the field is null on the merged row.