Engineering Blog

SECS/GEM Data Ingestion for Yield Analysis: What the Standard Allows and Where Fabs Add Custom Extensions

SECS/GEM Data Ingestion for Yield Analysis: What the Standard Allows and Where Fabs Add Custom Extensions

SECS/GEM is the substrate on which most modern fab automation is built, but it's also one of the most underspecified interfaces in production use. The SEMI E5 and E37 standards define a messaging framework, not a data schema. What actually flows through a SECS/GEM connection depends heavily on how each equipment vendor implemented their GEM host interface — and those implementations vary in ways that matter when you're trying to build a consistent yield analysis pipeline across a mixed-tool floor.

What SEMI E5 and E37 Actually Define

SEMI E5 is the SECS-II message encoding standard. It defines the binary message format, data item types (U1, U2, U4, I1, I2, I4, F4, F8, A, B, L), and the stream-function addressing scheme that routes messages between host and equipment. SEMI E37 is the HSMS (High-Speed Message Services) transport layer that runs SECS-II over TCP/IP — it replaced the older RS-232/RS-422 SECS-I serial transport for most modern tool generations.

GEM (SEMI E30) sits on top of SECS-II and defines the host-equipment interaction model: equipment constants, status variables, data collection events, alarms, process programs, and remote commands. GEM specifies what categories of information equipment must expose, not what specific values those variables carry. An equipment constant EC_PRESSURE might be present on one tool and absent on another running the same process step on a different manufacturer's platform.

This layered architecture matters for yield analysis ingestion because the data you can reliably extract from a SECS/GEM connection is only as good as the equipment's GEM implementation. SEMI E30 defines a compliance checklist, but full compliance does not mean rich compliance. A tool can be GEM-compliant while exposing only the minimum required status variables and none of the optional process-recipe parameters that a yield engineer actually wants.

Standard Event Categories: What You Can Count On

Across equipment from major semiconductor tool vendors — etch, CVD, CMP, lithography — there are several GEM event categories that are reliably available in practice:

  • Lot and wafer processing events: PROCESS_STATE transitions (IDLE → SETUP → EXECUTING → PAUSED → COMPLETING), lot ID and wafer ID bindings, carrier ID association. These are the backbone of lineage tracking and are present on essentially all GEM-compliant equipment.
  • Alarm events: ALARM_SET and ALARM_CLEAR with alarm ID and text. Alarm IDs are equipment-specific and not standardized across vendors, but the event framework is universal.
  • Equipment constants (ECs): Read-accessible parameters set by the process engineer, such as chamber temperature setpoints, gas flow nominal values, and RF power targets. Whether these are exposed through GEM vs. through a proprietary recipe server varies by tool generation.
  • Status variables (SVs): Real-time equipment state readings. Required SVs include ControlState, ProcessState, and a small set of defined process-state timestamps. Optional SVs — chamber-level temperature readings, sensor histories — are present on modern tools but were often omitted from equipment shipped before 2015.

For yield analysis specifically, the most valuable standard event is the PROCESS_JOB_COMPLETE event (or equivalent lot-end event in the equipment's implementation), which should carry the lot ID, wafer count, and the recipe program name that was executed. If this event fires reliably and carries the recipe version, you have the backbone of a lineage index without any custom extensions.

Where Standard GEM Falls Short for Yield Analysis

The limitations of standard GEM for yield analysis cluster around three gaps.

First, standard GEM doesn't define chamber-level granularity for multi-chamber tools. A cluster tool with four process chambers running the same recipe will report a single lot-processing event at the tool level. To determine which specific chamber processed which wafer, you need either chamber-level SV reporting (if the equipment supports it as a custom extension) or correlation against the equipment's internal maintenance logs via a separate query channel. This gap is why multi-chamber etch tools and CVD systems are often the hardest equipment to attribute defect excursions to — the standard event doesn't tell you which chamber was responsible.

Second, recipe version tracking is inconsistent. GEM supports process program management (upload/download of recipe files to/from equipment), but the recipe version that was active during a specific lot's processing is not guaranteed to be available as a reportable SV. Some tool implementations carry the active recipe name in the lot-end event; others require a separate host query after the fact. For yield analysis, a 48-hour-old recipe query against an equipment log is not the same as a real-time binding.

Third, sub-lot and per-wafer process metadata — actual process parameter values measured during the run, not just the setpoints — are typically in the equipment's own data collection framework rather than in standard GEM SVs. These are the EDA (Equipment Data Acquisition) values covered by SEMI E134, a separate standard that many equipment generations don't implement consistently.

Common Fab-Specific Custom Extensions

Most fabs running 14nm-and-below nodes have added custom GEM extensions to work around the gaps above. In our experience integrating with fab SECS/GEM hosts, the most common extensions fall into these categories:

  • Chamber-ID binding in lot events: Adding a custom SVID or report variable that carries the chamber designation (e.g., "CH-A", "CH-C") in the PROCESS_JOB_COMPLETE report. This requires a custom report definition on both the host and equipment sides, and must be agreed upon at equipment qualification time.
  • Recipe version as an SV: Exposing the active recipe name and version hash as a readable SV, so the host can request it at any point during processing rather than relying on the lot-end event. This is common on tools that see frequent recipe changes driven by process optimization cycles.
  • Expanded alarm taxonomy: Adding fab-specific alarm IDs with structured codes that map to the fab's internal maintenance classification system. Standard GEM alarm text is free-form and not parseable; custom alarm codes allow automated routing to the right maintenance queue.
  • Yield-relevant derived metrics as SVs: Some fabs push the equipment supplier to expose derived process metrics — etch uniformity index, deposition rate variance, particle count from the equipment's own in-situ particle monitor — as GEM SVs. This bridges the gap between standard GEM and the EDA data that would otherwise require SEMI E134 integration.

Event-Push vs. Host-Poll Architectures

GEM supports two patterns for data acquisition. In the event-push model, the equipment sends a Report message to the host whenever a Collection Event fires — lot start, lot end, alarm, state change. The host receives data passively and processes it in the order events arrive. In the host-poll model, the host periodically sends S6F19 (Request Event Report) or S1F3 (Request Selected Equipment Status) messages to pull current state from the equipment.

For yield analysis ingestion, event-push is the correct architecture. Polling introduces latency that scales with poll interval — a 60-second poll means a lot-end event can be up to 60 seconds late reaching the yield analysis pipeline. At a 12-minute target for alert delivery after inspection, 60 seconds of preventable latency at the ingestion layer is a significant fraction of the total budget.

In practice, we've found that host-poll architectures were often chosen not for technical reasons but because the early integration was done by a process engineer who had host access but not equipment-side GEM configuration permissions. The resulting polling architecture then persists for years because no one has the combination of access and motivation to change it.

The tradeoff is that event-push requires the equipment to be correctly configured to fire the relevant collection events and attach the right data to each report. If the equipment was qualified with a minimal GEM configuration, the reports won't carry the process metadata needed for yield analysis, and no amount of polling architecture improvement will fix that — you need to go back to equipment qualification and expand the report definitions.

Practical Notes for Integration Design

When building a SECS/GEM ingestion layer for yield analysis, the most important first step is an equipment audit — not a documentation audit, but a live connection audit. For each tool, connect a SECS/GEM host monitor, capture the collection events that fire during a normal lot cycle, and inventory what data is actually present in each report. Documentation from the equipment vendor is often years out of date relative to the software version running on the tool.

From that audit, you'll identify three tiers of tools: those with rich event data that can feed directly into the yield analysis pipeline, those with sparse events that need custom extension negotiation with the fab's equipment engineering team, and legacy tools where SECS/GEM integration is so thin that a file-drop alternative (polling a network share for recipe log exports) is more practical than trying to retrofit GEM data collection.

File-drop integration is not ideal — it introduces file-system dependencies and typically carries only end-of-lot summaries — but for equipment generations from 2010 and earlier, it may be the only practical path to yield-relevant data without a multi-month equipment software upgrade. A mixed-mode ingestion layer that handles both SECS/GEM event streams and file-drop sources under a common lot-ID index is the architecture most production fabs actually need.