Yield Correlation Between Inline Defect Data and WAT Probe Test Results: A Practical Engineering Guide
Inline defect data without electrical test context is a map without a legend. You know where defects are on the wafer, but you don't know which of them are killing die, and you don't know which process layers are the primary yield detractors. Joining inline defect records to WAT probe test results — specifically STDF-format probe data — gives you that context. This guide walks through the engineering mechanics of that join: how to match records across the two data systems, how to compute kill rates per defect family per layer, and what to do when the data is sparse.
The Data Sources: KLARF Inspection Records and STDF Probe Files
On the inspection side, each wafer inspection event produces a KLARF file containing: the wafer ID and lot ID, the inspection timestamp, the tool ID, defect centroid coordinates in wafer-space XY (microns from wafer center), defect size estimates, and per-defect classification metadata if auto-classification was run on the tool. Multiple inspection steps across a lot's process flow produce multiple KLARF files per wafer, each associated with a specific process layer or inspection step code.
On the electrical test side, wafer-level probe results are typically output in STDF (Standard Test Data Format) per SEMI E20. An STDF file contains the lot ID, wafer ID, die-level pass/fail results keyed to physical die coordinates in die-index XY relative to wafer notch orientation, and per-die per-test-parameter numeric values where applicable. Hard bin results (pass/fail) are what you need for yield correlation; soft bins and parametric test values are useful for more granular analysis but are not required for the basic kill-rate computation.
The join is conceptually straightforward: match lot IDs across KLARF and STDF files, then spatially correlate defect positions on the wafer to die outcomes at probe. A defect whose centroid falls inside a failing die on the probe map is a candidate kill defect. A defect inside a passing die is not killing that die at current process conditions.
Lot-ID Matching: The Alignment Problem
In practice, lot ID formats are not always consistent across inspection tools and probe test systems. We've encountered fabs where the lot ID in the KLARF file includes a sublot suffix that is stripped in the STDF file — e.g., KLARF records "LOT12345-A" while STDF records "LOT12345". Others use facility-specific lot ID formats that include a carrier sequence number in the KLARF but not in the probe file. Some inspection tools append an internal tool job ID to the lot reference that has no counterpart in probe data at all.
Before building a kill-rate computation, the lot-ID normalization step must be explicitly designed and tested against your facility's actual data. The normalization rules are usually simple once identified — strip the last two characters, apply a regex to extract the base ID — but identifying them requires comparing actual KLARF and STDF files from the same lot, not relying on documentation that may be years out of date.
Wafer ID matching is generally more reliable than lot ID matching, because wafer sequence numbers within a lot are consistent across inspection and test systems. The combination of normalized lot ID plus wafer sequence number (wafer 1 through N in the lot) is usually a reliable join key, assuming wafers are not swapped between carriers during the process flow.
Spatial Correlation: Mapping Defect Coordinates to Die Outcomes
Once lot and wafer records are matched, the spatial correlation step maps each defect centroid to the die it falls on. This requires knowledge of the die step-and-repeat pattern: the die X and Y pitch in microns, and the die origin offset from the wafer center. This information is available from the product's dicing plan or from the probe program's wafer map configuration.
Two coordinate systems must be aligned. KLARF coordinates are in wafer-space microns from wafer center, with axis orientation depending on the KLARF version and the wafer orientation flag. STDF die coordinates are in die-index integers (column, row) relative to the notch orientation. The conversion requires:
- Confirming the KLARF WaferOrientation field and normalizing coordinate axes to a consistent reference (notch-down convention is standard).
- Computing die pitch from the product stepper plan or probe map header.
- Converting wafer-space XY to die-index (col, row) using: col = floor((X − X_origin) / die_pitch_X), row = floor((Y − Y_origin) / die_pitch_Y).
- Cross-referencing the computed die index against the STDF hard bin result for that die.
Defects that fall outside all die boundaries — on scribe lines, edge exclusion zones, or alignment marks — are discarded for kill-rate analysis but retained for spatial pattern analysis. Scribe-line defects can indicate process issues even if they don't directly kill die.
Kill-Rate Attribution Per Defect Family Per Layer
With the spatial join complete, each defect record now carries a die outcome label: the die it falls on passed or failed at probe. The kill-rate computation is a ratio: for a given defect family at a given inspection layer, what fraction of die containing that defect type at that layer failed at probe?
Formally, for defect family F at inspection layer L:
- N_defect(F,L) = total die containing at least one defect of family F detected at layer L
- N_kill(F,L) = subset of those die that failed at probe (hard bin fail)
- Kill rate K(F,L) = N_kill(F,L) / N_defect(F,L)
The kill rate tells you which defect families and which process layers are the primary contributors to electrical yield loss. A bridging defect at the metal-1 layer with a kill rate of 0.83 is a high-priority target for process improvement. A particle defect at the STI layer with a kill rate of 0.04 is largely benign at current design rules. Without this ranked layer-attribution table, yield improvement efforts default to targeting whichever layer has the highest defect count — which is almost always the wrong metric.
Kill-rate analysis across layers is the only quantitative proof that a specific process step is electrically significant. A layer with 200 defects per wafer and a 0.02 kill rate contributes far less to yield loss than a layer with 8 defects and a 0.90 kill rate. The two look identical when sorted by defect count.
Handling Timing Lag Between Inspection and Probe
Inline inspection happens during or immediately after a specific process step. Probe test happens at the end of the process flow — potentially 3 to 10 days later depending on process complexity and fab cycle time. In that interval, multiple additional lots enter the same process steps and may have experienced the same equipment excursion that produced the original defects.
For yield correlation, this timing lag creates an attribution problem. The defects observed at metal-1 inspection on lot A correlate to the probe results for lot A, but the excursion that caused them may have continued through lots B, C, and D before being caught. Looking only at the correlation for lot A gives a partial picture of the excursion's electrical impact.
Our approach is to compute kill rates not just on the triggering lot but on a rolling window: all lots that passed through the flagged equipment chamber and recipe version within the excursion time window. This multi-lot kill-rate computation is more representative of the true electrical impact and also allows statistically more stable estimates. Rare defect families may have only 5 to 10 defect-die instances on a single lot, which gives a noisy kill-rate estimate. Pooling across 3 to 6 lots from the same excursion window stabilizes the estimate considerably.
Low-Defect-Count Correlation: When the Data Is Sparse
At advanced nodes, some process layers are inspected at a 5 to 10% sample rate rather than full-lot coverage. A lot may have only 15 to 20 inspected wafers, each with 2 to 8 defects of the family of interest. The kill-rate estimate from such sparse data has high variance — an estimate of 0.60 from 5 observations has a 95% confidence interval of roughly ±0.43, which spans most of the useful range and is not actionable without qualification.
Several strategies are useful for sparse-data situations. Bayesian updating against a prior kill rate — estimated from prior lots at the same layer and defect family — pulls the estimate toward the prior when data is sparse and allows the prior to be overridden as new evidence accumulates. If the historical prior for metal-1 bridging kill rate is 0.75 from 120 observations, a new lot with 3 bridging defects and 2 killed die yields an updated estimate closer to 0.75 than the raw 0.67 from the new data alone. This is equivalent to pseudo-count smoothing and is far more reliable than reporting the raw rate.
A second strategy is to widen the spatial correlation radius when kill-rate estimates are sparse. Rather than requiring the defect centroid to fall within the exact die boundary, apply a small spatial tolerance — ±2 to 5 microns, comparable to inspection tool positioning repeatability — that may capture defects functionally within a die but landing outside the die boundary due to coordinate system noise. This increases the sample size at the cost of some attribution precision: acceptable in low-data situations, but worth flagging in the output so engineers know the estimate is tolerance-widened.
The core engineering value of this join is not a single number — it's the ranked list of layer-family pairs sorted by kill rate times volume (kill rate × defect count per wafer = expected die loss per wafer from that layer-family). That list tells you exactly where to focus process improvement resources, updated lot by lot as new inspection and probe data arrive. Fabs that don't maintain this join can show flat or improving defect counts while electrical yield stays stagnant, because they're improving layers that don't matter. We've seen that pattern in multiple fabs, and it's entirely preventable once the correlation infrastructure is in place.