CN-121834248-B - Single Q-meter reinforcement learning integrated multi-head attention semiconductor valve analysis system

CN121834248BCN 121834248 BCN121834248 BCN 121834248BCN-121834248-B

Abstract

The invention relates to the technical field of artificial intelligence and monitoring of semiconductor manufacturing processes, in particular to a semiconductor valve analysis system integrating single Q-table reinforcement learning and multi-head attention, which constructs a state vector by utilizing a heterogeneous data preprocessing module through multidimensional sensing front-end acquisition parameters, and generating global characterization through a multi-head attention feature fusion engine, limiting an action space by combining a physical mechanism model through a single Q-table reinforcement learning machine guided by physical constraint to output an optimal instruction, and finally dynamically updating a value matrix through a closed-loop execution and monitoring feedback module. The invention can excavate deep feature association, ensure that decisions accord with physical laws, remarkably improve learning efficiency and control stability and ensure semiconductor production safety.

Inventors

SONG HAIXIAO
XUE HUI

Assignees

上海聚克流体控制有限公司

Dates

Publication Date: 20260508
Application Date: 20260313

Claims (10)

1. A single Q-table reinforcement learning fusion multi-headed attention semiconductor valve analysis system comprising: The multidimensional sensing front end is used for synchronously collecting physical parameters of fluid, mechanical state parameters of a valve and environmental interference parameters in the semiconductor process chamber in real time; the heterogeneous data preprocessing module is used for sampling the original signals acquired by the multidimensional sensing front end at a high speed; The multi-head attention feature fusion engine is used for generating an enhanced global characterization vector based on the initial state vector sequence; The single Q table reinforcement learning machine guided by physical constraint is used for receiving the reinforced global characterization vector, storing state action value information through preset 1024 finely divided discrete logic states and a single Q table structure, and carrying out legal boundary limitation on an action space by combining an embedded physical mechanism model so as to carry out iterative learning by adopting a Bellman equation as a core updating function and outputting an optimal valve analysis instruction; The closed loop execution and monitoring feedback module is used for converting the valve analysis instruction into a high-resolution pulse width modulation signal through the digital signal conversion unit and sending the high-resolution pulse width modulation signal to the power driving control unit so as to drive the actuating mechanism to act, synchronously capturing the system state response after execution, and dynamically updating the value matrix in the single Q table reinforcement learning machine guided by physical constraint by calculating the deviation between the actual response and an expected target.
2. The single Q-table reinforcement learning fusion multi-head attention semiconductor valve analysis system according to claim 1, wherein the heterogeneous data preprocessing module applies a 5-order butterworth low-pass filter to remove high-frequency noise components, extracts the mean value, variance and first-order change rate of each parameter within 100 milliseconds before the current moment through a sliding window algorithm, maps data of different dimensions into a numerical range from 0 to 1 by using a minimum maximum normalization method, and constructs an initial state vector sequence with unified time scale characteristics by combining a state estimation algorithm based on kalman filtering.
3. The system of claim 2, wherein the multi-head attention feature fusion engine projects the initial state vector sequence into a plurality of non-interfering feature spaces by using a linear mapping layer to generate a query matrix, a key matrix and a value matrix, calculates association weights between different physical dimensions by 8 self-attention heads in parallel, identifies key process features that have a decisive influence on system stability, and sums the fused features with the initial state vector by using a residual connection layer to generate an enhanced global characterization vector.
4. A single Q-table reinforcement learning fusion multi-headed semiconductor valve analysis system according to claim 3, wherein the single Q-table reinforcement learning machine guided by physical constraints calculates the pressure change rate and flow rate deviation possibly caused by the action in real time through a boundary check operator in the decision process, and penalty cuts the value in the single Q-table and redirects the decision to an adjacent safe action interval conforming to the physical rule when the predicted result exceeds the safety boundary set by the physical mechanism model.
5. The system of claim 4, wherein the multi-head attention feature fusion engine further introduces a position coding mechanism when calculating the attention weight, embeds time series information of parameter sampling into vectors, enables a model to distinguish physical features appearing at different time points so as to identify pressure pulse signals with causal relation, and identifies parameter groups with the most influence under a specific process step by counting the spatial distribution of the attention, and provides weights of pressure sensor data in a deposition process stage and weights of flow control data in an etching process stage.
6. The single Q-table reinforcement learning fusion multi-head attention semiconductor valve analysis system of claim 5, wherein the multi-head attention feature fusion engine comprises 8 independent attention head attention feature dimensions that differ; the 1 st to 3 rd attention heads focus on capturing the pressure fluctuation trend of the long time sequence; The 4 th to 6 th attention heads focus on capturing coupling interference between different air paths; The 7 th and 8 th attention heads focus on capturing the instantaneous hysteresis relationship between valve actuation command and flow response; The global characterization vector output by the multi-head attention feature fusion engine is also sent to a prediction branch for predicting the chamber pressure trend within 50 milliseconds in the future, and the prediction result of the chamber pressure trend is used as a feedforward compensation term to be connected into a closed-loop execution and monitoring feedback module.
7. The single Q-table reinforcement learning fusion multi-head attention semiconductor valve analysis system according to claim 6 is characterized in that the physical mechanism model stores flow speed limit values of a semiconductor valve in different pressure intervals and acceleration threshold values in the opening and closing processes, takes compressibility factors and adiabatic indexes of gas into consideration, and dynamically corrects legal boundaries of an action space according to temperature and pressure parameters monitored in real time, and the single Q-table reinforcement learning machine guided by physical constraint automatically eliminates non-physical action instructions causing mass non-conservation or energy mutation by introducing a fluid conservation law into a physical constraint layer.
8. The system for analyzing the semiconductor valve with the multi-head attention integrated with the single-Q-table reinforcement learning of claim 7, wherein the value updating logic of the single-Q-table reinforcement learning machine is characterized in that a state space is defined as a pressure difference, a flow deviation degree and a valve opening variation at the current moment; The discount factor in the core updating function is set to be 0.95, the learning rate adopts a dynamic attenuation strategy, the initial learning rate is 0.1 and gradually attenuates to 0.001 along with the increase of the training steps, the rewarding function comprises goal achievement rewards and physical violation penalties, positive rewards are given if the system reaches a preset pressure stabilization point, and negative rewards are given if the physical constraint limit is triggered.
9. The system for analyzing the semiconductor valve integrated with the multi-head attention by the single-Q-table reinforcement learning according to claim 8 is characterized in that the action stepping quantity in the single-Q-table reinforcement learning machine is subjected to nonlinear quantization processing, the resolution of the action stepping quantity is set to be 0.01% in a nonlinear sensitive area with the valve opening close to full-closure or full-opening, the resolution of the action stepping quantity is set to be 0.1% in an intermediate linear area, and the single-Q-table reinforcement learning machine guided by physical constraint triggers a model reconstruction flow when the action command of a plurality of continuous periods is intercepted by a physical verification gateway, and an optimal control strategy is searched again by increasing the random exploration step number.
10. The system for analyzing the semiconductor valve with the single-Q-table reinforcement learning fused multi-head attention according to claim 9, wherein the single-Q-table reinforcement learning machine comprises an offline pre-training stage and an online incremental learning stage, wherein the offline pre-training stage utilizes historical process data to initialize the single-Q-table, the online incremental learning stage carries out fine adjustment on the single-Q-table according to real-time feedback in a production process, the single-Q-table reinforcement learning machine further calculates an entropy value of a current state when performing action update, reduces an exploration step length when the entropy value is lower than a preset entropy threshold value determined by historical process data statistics, and increases an exploration range when the entropy value is higher than the preset entropy threshold value determined by historical process data statistics.

Description

Single Q-meter reinforcement learning integrated multi-head attention semiconductor valve analysis system Technical Field The invention belongs to the technical field of artificial intelligence and semiconductor manufacturing process monitoring, and particularly relates to a semiconductor valve analysis system integrating single Q-meter reinforcement learning and multi-head attention. Background Semiconductor manufacturing is used as a core foundation of the modern electronics industry, and its process flows place near-critical demands on environmental parameters and fluid control accuracy. In order to realize autonomous regulation and control of a complex physical process, the intelligent analysis and control system based on reinforcement learning is widely applied to the field of precision manufacturing, and aims to realize real-time evaluation and parameter optimization of the running state of semiconductor production equipment by establishing an interactive feedback mechanism of an intelligent body and an environment to replace the traditional artificial experience or fixed control logic. The performance of the semiconductor valve analysis system is used as a key link for controlling fluid transportation, and the pressure stability in a process chamber and the flow accuracy of chemical gas are directly determined. Such systems typically utilize reinforcement learning frameworks to model the valve's motion response in relation to system state, and through continuous iteration of the state motion cost function, optimal control paths and analysis strategies are sought in a dynamically changing production environment to meet the rigorous indicators of the nanoscale manufacturing process on stability. The prior art mainly relies on a reinforcement learning algorithm driven by pure data, and when the reinforcement learning algorithm faces to rare working conditions or nonlinear disturbance in the semiconductor manufacturing process, the defect that decisions lack physical logic constraint is often exposed. Because of the lack of deep characterization of physical mechanisms such as fluid dynamics by the model, the system is extremely prone to outputting non-physical decision instructions that violate physical laws, such as operations that are theoretically not feasible to produce a momentary reverse jump in pressure, and are sufficient in real operations to cause damage to high-value equipment. Meanwhile, because the model needs to be explored unordered in a huge state space, the existing system faces the serious problem of low sample efficiency, so that the training period is overlong and the process requirement of quick switching is difficult to adapt. In addition, the traditional analysis method has the defect of weak information relevance when processing multidimensional process characteristics, and cannot effectively capture deep dependency relations among key parameters, so that the decision accuracy and convergence rate under the condition of multiple constraints cannot meet the requirements of industrial-grade application. Disclosure of Invention The invention aims to provide a semiconductor valve analysis system integrating single Q-table reinforcement learning and multi-head attention, which aims to solve the technical problems of inaccurate decision-making caused by physical logic deficiency, low sample efficiency and weak multidimensional feature relevance in the existing semiconductor valve control technology. The technical scheme of the invention is a semiconductor valve analysis system integrating single Q-table reinforcement learning and multi-head attention, comprising: The multidimensional sensing front end is used for synchronously collecting physical parameters of fluid, mechanical state parameters of a valve and environmental interference parameters in the semiconductor process chamber in real time; the heterogeneous data preprocessing module is used for carrying out time domain feature extraction, frequency domain transformation and data normalization processing on an original signal acquired by the multidimensional sensing front end, and constructing an initial state vector sequence with unified time scale features; The multi-head attention feature fusion engine is used for calculating the association weight among different physical dimensions by utilizing a plurality of self-attention heads in parallel based on the initial state vector sequence, identifying key process features which have decisive influence on the stability of the system, and generating an enhanced global characterization vector; the single Q table reinforcement learning machine guided by physical constraint is used for receiving the reinforced global characterization vector, storing state action value information through a preset single Q table structure, and carrying out legal boundary limitation on an action space by combining an embedded physical mechanism model so as to output an optimal valve analysis instruction