CN-122000778-A - Laser power stable control method and system

CN122000778ACN 122000778 ACN122000778 ACN 122000778ACN-122000778-A

Abstract

The invention provides a laser power stable control method and system, which comprise the steps of building a laser power measurement system based on an optical frequency shift effect, training and optimizing a reinforcement learning model, wherein a state space of the reinforcement learning model comprises frequency deviation of a current control period, frequency change rate of the current control period relative to a previous control period and a driving signal of the previous control period, an action space of the reinforcement learning model comprises driving signal increment of the current control period, a reward function of the reinforcement learning model is used for minimizing the frequency deviation and inhibiting rapid change of the frequency, a driving signal of the current control period is obtained based on the driving signal of the previous control period and the action space of the reinforcement learning model after optimization, and an acousto-optic driving module is controlled based on the driving signal of the current control period. The invention provides a laser power stable control method and a system, which are used for solving the problem of accurate stable control of laser power in dynamic change and complex interference environments.

Inventors

JI QIANQIAN
CUI XIAOJUAN
HAN LEI
SUN JINGXIN
Shang Haosen
SHEN TONG
XUE XIAOBO
ZHANG SHENGKANG
GE JUN

Assignees

北京无线电计量测试研究所

Dates

Publication Date: 20260508
Application Date: 20251226

Claims (10)

1. A laser power stabilization control method, characterized by comprising: Building a laser power measurement system based on an optical frequency shift effect, wherein the laser power measurement system comprises a laser, an acousto-optic driving module, an atomic clock and a frequency measurement module, the acousto-optic driving module is used for adjusting laser power, and the frequency measurement module is used for measuring the output frequency of the atomic clock; Training and optimizing a reinforcement learning model, wherein a state space of the reinforcement learning model comprises a frequency deviation of a current control period, a frequency change rate of the current control period relative to a previous control period and a driving signal of the previous control period, the frequency deviation of the current control period is a difference value between an output frequency of the atomic clock in the current control period and a standard frequency, and the frequency change rate of the current control period relative to the previous control period is a difference value between the frequency deviation of the current control period and the frequency deviation of the previous control period; Obtaining a driving signal of the current control period based on the driving signal of the previous control period and the action space of the optimized reinforcement learning model; And controlling the acousto-optic driving module based on the driving signal of the current control period to stabilize the laser power.
2. The method of claim 1, wherein the state space of the reinforcement learning model includes a frequency deviation of a current control period, a frequency change rate of the current control period with respect to a previous control period, and a driving signal of the previous control period includes: s k ＝[Δv k ,Δv k -Δv k-1 ,u k-1 ] Wherein s k is a state space of a current control period, Δv k is a frequency deviation of the current control period, Δv k-1 is a frequency deviation of a previous control period, Δv k -Δv k-1 is a frequency change rate of the current control period relative to the previous control period, and u k-1 is a driving signal of the previous control period.
3. The method of claim 2, wherein the action space is derived based on a control strategy and the state space and includes within a preset range: a k ＝π θ (s k ) Wherein a k is the action space of the current control period, a k ∈[a min ,a max ],π θ is the control strategy, and θ is the strategy parameter.
4. The method of claim 2, wherein the reinforcement learning model's reward function for minimizing frequency bias and suppressing rapid changes in frequency comprises: Wherein r k is the reward signal of the current control period, and alpha and beta are weight coefficients.
5. The method of claim 3, wherein the deriving the drive signal for the current control period based on the drive signal for the previous control period and the optimized reinforcement learning model's action space comprises: ux=u k-1 +a k Wherein u k is the driving signal of the current control period.
6. The laser power stabilization control system is characterized by comprising a laser power measurement system, a first processing module, a reinforcement learning module and a second processing module, wherein: The laser power measurement system is built based on an optical frequency shift effect and comprises a laser, an acousto-optic driving module, an atomic clock and a frequency measurement module, wherein the acousto-optic driving module is used for adjusting laser power, and the frequency measurement module is used for measuring the output frequency of the atomic clock; the first processing module is used for calculating the frequency deviation of the current control period and the frequency change rate of the current control period relative to the previous control period, wherein the frequency deviation of the current control period is the difference value between the output frequency of the atomic clock in the current control period and the standard frequency, and the frequency change rate of the current control period relative to the previous control period is the difference value between the frequency deviation of the current control period and the frequency deviation of the previous control period; The system comprises a reinforcement learning module, a reward function and a control module, wherein the reinforcement learning module is used for deploying a trained and optimized reinforcement learning model, and a state space of the reinforcement learning model comprises a frequency deviation of a current control period, a frequency change rate of the current control period relative to a previous control period and a driving signal of the previous control period; The second processing module is used for obtaining a driving signal of a current control period based on the driving signal of the previous control period and the action space of the optimized reinforcement learning model, and the driving signal of the current control period is used for controlling the acousto-optic driving module so as to stabilize the laser power.
7. The system of claim 6, wherein the state space of the reinforcement learning model including the frequency deviation of the current control period, the rate of change of the frequency of the current control period relative to the previous control period, and the drive signal of the previous control period includes: s k ＝[Δv k ,Δv k -Δv k-1 ,u k-1 ] Wherein s k is a state space of a current control period, Δv k is a frequency deviation of the current control period, Δv k-1 is a frequency deviation of a previous control period, Δv k -Δv k-1 is a frequency change rate of the current control period relative to the previous control period, and u k-1 is a driving signal of the previous control period.
8. The system of claim 7, wherein the action space is derived based on a control strategy and the state space and comprises, within a preset range: a k ＝π θ (s k ) Wherein a k is the action space of the current control period, a k ∈[a min ,a max ],π θ is the control strategy, and θ is the strategy parameter.
9. The system of claim 7, wherein the reward function of the reinforcement learning model for minimizing frequency deviation and suppressing rapid changes in frequency comprises: Wherein r k is the reward signal of the current control period, and alpha and beta are weight coefficients.
10. The system of claim 8, wherein the deriving the drive signal for the current control period based on the drive signal for the previous control period and the optimized reinforcement learning model's action space comprises: u k ＝u k-1 +a k Wherein u k is the driving signal of the current control period.

Description

Laser power stable control method and system Technical Field The invention relates to the technical field of photoelectricity, in particular to a laser power stable control method and system. Background The laser is widely applied to researches such as precise measurement, quantum calculation, quantum frequency standard and the like, and tiny fluctuation of the laser power has a remarkable influence on the accuracy and reliability of an experimental result, so that how to realize precise and stable control of the laser power becomes a research focus in the related technical field. Currently, conventional methods such as PID control, fuzzy control, and Model Predictive Control (MPC) are generally used for laser power stabilization control. PID control is used as a common feedback adjustment method, and can adjust laser power by adjusting gain parameters, but stability and robustness of the PID control are weak under high fluctuation, nonlinearity and time-varying systems, and particularly precise control cannot be effectively maintained under complex external disturbance. In addition, the existing control method cannot fully consider the self-adaptive adjustment capability of factors such as environmental change, light source characteristic change and the like, so that the effectiveness of the control method in a changeable environment is limited. The laser power measuring method based on the optical frequency shift principle (AC Stark effect) can provide high-precision power measurement, and how to further utilize accurate feedback control to realize long-term stability of laser power is still a technical problem. Disclosure of Invention The invention provides a laser power stable control method and a system, which are used for solving the problem of accurate stable control of laser power in dynamic change and complex interference environments. In a first aspect, the present invention provides a laser power stabilization control method, including: Building a laser power measurement system based on an optical frequency shift effect, wherein the laser power measurement system comprises a laser, an acousto-optic driving module, an atomic clock and a frequency measurement module, the acousto-optic driving module is used for adjusting laser power, and the frequency measurement module is used for measuring the output frequency of the atomic clock; Training and optimizing a reinforcement learning model, wherein a state space of the reinforcement learning model comprises a frequency deviation of a current control period, a frequency change rate of the current control period relative to a previous control period and a driving signal of the previous control period, the frequency deviation of the current control period is a difference value between an output frequency of the atomic clock in the current control period and a standard frequency, and the frequency change rate of the current control period relative to the previous control period is a difference value between the frequency deviation of the current control period and the frequency deviation of the previous control period; Obtaining a driving signal of the current control period based on the driving signal of the previous control period and the action space of the optimized reinforcement learning model; And controlling the acousto-optic driving module based on the driving signal of the current control period to stabilize the laser power. Optionally, the state space of the reinforcement learning model includes a frequency deviation of a current control period, a frequency change rate of the current control period relative to a previous control period, and a driving signal of the previous control period includes: sk＝[Δvk,Δvk-Δvk-1,uk-1] Wherein s k is a state space of a current control period, Δv k is a frequency deviation of the current control period, Δv k-1 is a frequency deviation of a previous control period, Δv k-Δvk-1 is a frequency change rate of the current control period relative to the previous control period, and u k-1 is a driving signal of the previous control period. Optionally, the action space is obtained based on a control strategy and the state space and includes, within a preset range: ak＝πθ(sk) Wherein a k is the action space of the current control period, a k∈[amin,amax],πθ is the control strategy, and θ is the strategy parameter. Optionally, the reward function of the reinforcement learning model for minimizing frequency deviation and suppressing rapid changes in frequency includes: Wherein r k is the reward signal of the current control period, and alpha and beta are weight coefficients. Optionally, the obtaining the driving signal of the current control period based on the driving signal of the previous control period and the action space of the optimized reinforcement learning model includes: uk＝uk-1+ak Wherein u k is the driving signal of the current control period. In a second aspect, the present invention provides a laser power stab