Engineering Blog

Fab-Wide Lineage Tracking: Connecting Lot IDs, Equipment Chambers, and Recipe Versions in Real Time

Fab-Wide Lineage Tracking: Connecting Lot IDs, Equipment Chambers, and Recipe Versions in Real Time

When a defect excursion fires, the first question a yield engineer asks is not "what is the defect?" — the classification engine handles that. The first question is "which tool chamber, running which recipe version, processed these lots?" That question requires a lineage index: a live, queryable record that maps every lot that passed through the fab to the specific equipment instances it touched. Building that index from the data sources available in a real production environment is the hard part. This article covers how it's done.

What a Lineage Index Contains

A fully populated lineage record for a single lot-process-step binding looks like this:

  • Lot ID: the primary key shared across all data systems (inspection, probe, MES, equipment)
  • Wafer sequence range: which wafers in the lot were processed in this run (some lots are split across runs on the same tool)
  • Process step code: the MES operation identifier (e.g., "M1-ETCH-01") that identifies the step in the process flow
  • Tool ID: the equipment identifier from the MES equipment master (e.g., "ETCH-7B")
  • Chamber ID: the specific chamber within a multi-chamber cluster tool (e.g., "CH-C")
  • Recipe name and version hash: the process recipe executed, including a version identifier or hash that distinguishes recipe revisions
  • Processing start and end timestamps: with sub-minute precision for excursion window reconstruction
  • Shift designation: the shift (day/night/weekend) active during processing, for shift-correlated excursion analysis
  • Operator ID: present where available, for setup-correlated excursion investigation

A complete fab lineage index contains one record per lot per process step per equipment run. For a product with 400 process steps and a lot size of 25 wafers, this is 400 records per lot. At 20,000 wafer starts per month and 25 wafers per lot, that's 800 lots per month and 320,000 lineage records per month. At a 12-month rolling window, the index holds roughly 3.8 million records — a size well within what a PostgreSQL or TimescaleDB instance handles without special infrastructure.

Data Sources for Lineage Records

Lineage data comes from three sources in a modern fab: the MES (Manufacturing Execution System), the SECS/GEM host interface for real-time equipment event capture, and equipment-side log exports for tools not yet on live SECS/GEM integration.

The MES — Workstream, Camstar, or PDF Solutions Exensio in most advanced-node fabs — is the system of record for lot-to-operation bindings. Every time a lot is dispatched to a tool and the run completes, the MES records the tool ID, the operation, the start and end times, and the wafer count. This is the backbone of the lineage index. The limitation is that most MES systems record tool ID at the equipment level, not at the chamber level for multi-chamber tools, because SEMI standards only recently formalized chamber-level tracking and most MES installations predate those updates.

SECS/GEM provides chamber-level granularity — when the equipment is configured to expose it. As discussed in the SECS/GEM data ingestion article, the collection event that fires at lot-end should carry the chamber ID that processed the lot, but only if the equipment's GEM implementation was configured at qualification time to include chamber-level binding in its lot-end report. For equipment without that configuration, the SECS/GEM connection provides timestamps and recipe names but not chamber attribution.

Equipment-side log exports are the fallback. Most process tools maintain internal job logs that record which chamber processed each carrier, with timestamps accurate to a few seconds. Exporting and parsing these logs — through file share access, FTP, or equipment data push to a shared directory — is less elegant than a live SECS/GEM event stream, but it works for tools where GEM chamber binding wasn't configured at installation. The main risk is log retention policy: some tools overwrite their internal logs after 7 to 14 days. If your excursion investigation runs longer than the log retention window, you've lost the raw evidence.

Building the Index: MES Integration Architecture

For Workstream MES, lineage data is accessible through the Workstream web services API, which exposes lot history queries (what operations has a lot completed, in what order, on which equipment) as REST endpoints. The lot history response includes operation code, equipment ID, start and end times, and sometimes recipe program name — but not chamber ID or recipe version hash in the standard response.

For Camstar, lot history is accessible through the Camstar Service Interface (CSI) framework. The query model is different — Camstar represents manufacturing history as container transactions in a document model — but the available data is similar to Workstream. Chamber-level data in Camstar requires custom field additions at the container transaction level, which must be agreed upon with the MES configuration team.

The practical integration pattern we've landed on for MES lot-history ingestion is event-triggered rather than polled. Rather than running a periodic query against the MES API to find newly completed operations, we consume the MES's transaction event stream — Workstream publishes these via JMS topics; Camstar via its CSI event bus — and maintain the lineage index in near-real-time as each operation completes. This keeps index latency under 30 seconds for the MES-sourced fields, which is sufficient for combining with SECS/GEM event data that arrives with similar latency.

Chamber Attribution: When the MES Doesn't Know

The gap between tool-level MES records and chamber-level lineage is the most common root-cause attribution bottleneck we encounter. When an engineer asks "which specific chamber on the ETCH-7B cluster processed lot X?", the MES record says "ETCH-7B" and no more. Answering that question correctly requires one of three approaches.

First, if the equipment has a live SECS/GEM connection with chamber-level reporting configured, the SECS/GEM event stream carries chamber ID in the lot-end collection event report. This is the preferred path and should be part of equipment qualification requirements for any new tool installation.

Second, for existing tools without chamber-level SECS/GEM reporting, cross-correlation with the equipment's internal process log provides an indirect chamber attribution. The log carries a job sequence with timestamps and chamber assignments. Matching the lot's MES processing window to the job sequence in the equipment log gives chamber attribution at the cost of parsing equipment-specific log formats — which are not standardized and require per-tool-family parsers.

Third, statistical inference from spatial defect pattern correlation. If the spatial pattern engine has identified that the defect cluster on lot X has a geometry matching prior confirmed excursions from chamber C of tool ETCH-7B, that historical pattern overlap is itself evidence for chamber attribution, independent of the equipment log. This is not a definitive attribution — it's a ranked hypothesis — but it allows engineers to start the investigation while the equipment log retrieval completes.

In our experience, the multi-chamber attribution problem is where most root-cause triage time is lost when lineage tracking is incomplete. Engineers who know the defect came from ETCH-7B but don't know which of its four chambers spend the investigation time doing manual job log searches that should have been automated.

Recipe Version Tracking and Change-Point Detection

Recipe version tracking matters because many excursions originate from recipe changes — process engineers adjusting etch time, deposition rate, or anneal temperature as part of routine process optimization — that silently shift yield before anyone notices the change's effect at probe. If the lineage index only records the recipe name and not the recipe version that was active at processing time, you cannot distinguish lots processed on recipe v14 from lots processed on the v15 revision that changed etch endpoint timing by 3 seconds.

Recipe version as a lineage field requires either SECS/GEM recipe version reporting (via the process program management events defined in GEM, or a custom SV for the active recipe hash) or MES recipe version tracking (Workstream and Camstar both support recipe version management, but whether the version at processing time is captured in the lot history record depends on the MES configuration). In several fab integrations we've seen, the MES captures the recipe name at dispatch but not the version that was actually executed — which is different when a recipe was updated in the equipment between dispatch and execution.

Once recipe versions are in the lineage index, change-point detection becomes straightforward: for each tool-operation-recipe combination, compute the running mean defect density for lots processed on each recipe version. A statistically significant step increase in defect density coinciding with a recipe version change is strong evidence that the change caused the excursion, even before chamber-level attribution is complete. This narrows the investigation scope from "something changed on ETCH-7B" to "the v15 recipe revision on ETCH-7B changed something that increased particle generation."

Root-Cause Hypothesis Ranking

With a populated lineage index and an active excursion, the root-cause hypothesis ranking algorithm computes a ranked list of equipment-chamber-recipe hypotheses most likely associated with the observed defect cluster. The ranking is based on three signals: defect pattern overlap between the current excursion and prior confirmed excursions from each candidate tool-chamber, frequency of lot exposure (how many of the affected lots passed through each candidate chamber in the excursion time window), and recency of any recipe or maintenance events on each candidate chamber that could explain a process shift.

A hypothesis that scores high on all three dimensions — matching spatial pattern, high lot overlap, and a recent recipe change or maintenance event — goes to the top of the ranked list. The engineer's job is to confirm the top hypothesis, not to search from scratch. That shift from open-ended investigation to confirmation of a ranked shortlist is what compresses triage time from 40 hours to under 90 minutes in production deployments where lineage tracking is complete.