CN-121998020-A - Reinforcement learning processing method, storage medium, electronic device and computer program product

CN121998020ACN 121998020 ACN121998020 ACN 121998020ACN-121998020-A

Abstract

The embodiment of the application provides a reinforcement learning processing method, a storage medium, an electronic device and a computer program product, wherein the sending method comprises the steps of obtaining result feedback according to model reasoning of a machine learning model for output analysis, generating rewards and/or states required by reinforcement learning according to the result feedback, and obtaining action feedback from a first entity according to the rewards and/or the states, so that the problems that a core network does not support reinforcement learning related technology in related technology, a real-time analysis quality feedback mechanism is not available, model enhancement and analysis quality monitoring cannot be carried out, reinforcement learning related technology is supported in the core network, the quality feedback mechanism can be analyzed in real time in the core network, and model enhancement and analysis quality monitoring can also be carried out.

Inventors

Feng Yuang
ZHU JINGUO
QU ZHICHENG
XIE PENGXIANG

Assignees

中兴通讯股份有限公司

Dates

Publication Date: 20260508
Application Date: 20241108

Claims (20)

1. A reinforcement learning processing method, applied to an environmental interpreter, the method comprising: obtaining result feedback according to model reasoning of a machine learning model for output analysis, and generating rewards and/or states required by reinforcement learning according to the result feedback; and obtaining action feedback from the first entity according to the rewards and/or the states.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises, Applying the motion feedback ensures analysis of an output quality or a model quality of the machine learning model.
3. The method of claim 2, wherein applying the motion feedback to ensure analysis of output quality or model quality of the machine learning model comprises: And at least one of the following processes of using a new machine learning model for model reasoning, updating the machine learning model and carrying out model reasoning, continuing to use the machine learning model for reasoning, acquiring the new machine learning model from other network elements and carrying out model reasoning is carried out on the machine learning model according to the indication information of the action feedback.
4. The method of claim 3, wherein the step of, The indication information of the action feedback comprises at least one of the following: First indication information indicating use of a new machine learning model, wherein the first indication information includes at least one of a new model file, a model file link of the new machine learning model, an ID of the new machine learning model; indicating to update second indicating information of the machine learning model, wherein the second indicating information carries model parameter information to be updated and corresponding values; third indication information indicating to continue using the machine learning model; And fourth indication information for indicating to acquire a new machine learning model from other network elements, wherein the fourth indication information comprises identification information or address information of the other network elements.
5. The method of claim 1, wherein obtaining motion feedback from the first entity based on the reward and/or the status comprises: transmitting the reward and/or the status to the first entity; And receiving action feedback of the machine learning model determined by the first entity according to the rewards and/or the states.
6. The method of claim 1, wherein obtaining outcome feedback from model reasoning of a machine learning model for output analysis, and generating rewards and/or states required for reinforcement learning from the outcome feedback comprises: under the condition that the environment reader is arranged in a second entity, carrying out model reasoning by using the machine learning model, and transmitting a model reasoning result to a 5GC network element; Receiving result feedback of the model reasoning result generated by the 5GC network element; converting the outcome feedback into the reward and/or the status.
7. The method of claim 6, wherein the method further comprises: receiving an analysis quality monitoring request sent by the 5GC network element during subscription analysis; Decision making and initiation of reinforcement learning based analysis quality monitoring.
8. The method of claim 7, wherein the step of determining the position of the probe is performed, The quality monitoring request carries the identification ID of the analysis and/or recommends that the analysis is subjected to quality monitoring by reinforcement learning.
9. The method of claim 6, wherein the method further comprises: Receiving a termination instruction of analysis quality monitoring termination or reinforcement learning termination sent by a 5GC network element; And sending the termination instruction to the first entity, wherein the termination instruction comprises a reinforcement learning association ID.
10. The method of claim 6, wherein the method further comprises: after receiving an analysis request of a 5GC network element, sending a request for subscribing a machine learning model to the first entity, wherein the subscription request comprises an analysis ID; The machine learning model is received with feedback from the first entity, wherein the first entity is configured to make decisions and initiate reinforcement learning based model quality monitoring.
11. The method of claim 6, wherein the method further comprises: and receiving a termination instruction of reinforcement learning or model quality monitoring sent by the first entity, wherein the termination instruction comprises reinforcement learning association ID.
12. The method of claim 1, wherein obtaining outcome feedback from model reasoning of a machine learning model for output analysis, and generating rewards and/or states required for reinforcement learning from the outcome feedback comprises: And under the condition that the environment reader is arranged in a 5GC network element, carrying out model reasoning, evaluating a model reasoning result to obtain the result feedback, and generating the rewards and/or the states according to the result feedback.
13. The method according to claim 12, wherein the method further comprises: and deciding and initiating analysis quality monitoring based on reinforcement learning by the 5GC network element.
14. The method according to claim 12, wherein the method further comprises: And sending a termination instruction to the first entity, wherein the termination instruction comprises a reinforcement learning association ID.
15. The method according to claim 12, wherein the method further comprises: sending a subscription request for subscribing the machine learning model to a first entity, wherein the subscription request carries whether the 5GC network element has an environment interpreter or supports reinforcement learning capability; and receiving an acquisition request of requesting the rewards and/or the states by the first entity when model feedback is carried out under the condition that the 5GC network element has the environment interpreter or supports reinforcement learning capability.
16. The method according to claim 12, wherein the method further comprises: And receiving a termination instruction of reinforcement learning or model quality monitoring initiated by the first entity, wherein the termination instruction comprises reinforcement learning association ID.
17. The method of any one of claims 1 to 16, wherein prior to obtaining outcome feedback from model reasoning of a machine learning model for output analysis, and generating rewards and/or states required for reinforcement learning from the outcome feedback, the method further comprises: Sending a joining request for reinforcement learning analysis output quality monitoring to the first entity; And receiving a joining response of the reinforcement learning analysis output quality monitoring returned by the first entity, wherein the joining response carries indication information of agreeing to joining.
18. The method according to any one of claims 1 to 16, wherein, The prize is positive or negative and the status is a discrete value greater than one.
19. The method of claim 6, wherein the step of providing the first layer comprises, The first entity and the second entity are at least one of a model training logic function MTLF, an analysis logic function AnLF and a network data analysis function NWDAF.
20. A reinforcement learning processing method, applied to a first entity, the method comprising: receiving rewards and/or states required for reinforcement learning transmitted by an environment interpreter, wherein the rewards and/or the states are generated by the environment interpreter according to feedback of results of model reasoning of a machine learning model for output analysis; deciding action feedback based on the rewards and/or the status; And sending the action feedback to the environment interpreter.

Description

Reinforcement learning processing method, storage medium, electronic device and computer program product Technical Field Embodiments of the present application relate to the field of communications technologies, and in particular, to a reinforcement learning processing method, a storage medium, an electronic device, and a computer program product. Background The reinforcement learning technology is suitable for various scenes, such as monitoring analysis quality provided by a Network element of an analysis logic Function (AnalyticsLogic Function, which is abbreviated as AnLF) and a Network data analysis Function (Network DATA ANALYSIS Function, which is abbreviated as NWDAF), and monitoring machine learning model quality trained by a model training logic Function (Model Training Logic Function, which is abbreviated as MTLF) (NWDAF), but the core Network does not support reinforcement learning related technology, so that a real-time analysis quality feedback mechanism is not available, and model enhancement and analysis quality monitoring cannot be performed. Aiming at the problems that a core network in the related technology does not support reinforcement learning related technology, so that a real-time analysis quality feedback mechanism is not available, model enhancement and analysis quality monitoring cannot be carried out, no solution is proposed yet. Disclosure of Invention The embodiment of the application provides a reinforcement learning processing method, a storage medium, an electronic device and a computer program product, which at least solve the problems that a core network in the related technology does not support reinforcement learning related technology, so that a real-time analysis quality feedback mechanism is not available, and model enhancement and analysis quality monitoring cannot be performed. According to an embodiment of the present application, there is provided a reinforcement learning processing method applied to an environment interpreter, the method including: obtaining result feedback according to model reasoning of a machine learning model for output analysis, and generating rewards and/or states required by reinforcement learning according to the result feedback; Motion feedback is obtained from MTLF based on the rewards and/or the status. According to another embodiment of the present application, there is provided a reinforcement learning processing method applied to a first entity, the method including: receiving rewards and/or states required by reinforcement learning sent by an environment interpreter, wherein the rewards and/or the states are generated by the environment interpreter according to feedback of results of model reasoning of a machine learning model; deciding action feedback based on the rewards and/or the status; And sending the action feedback to the environment interpreter. According to a further embodiment of the present application, there is also provided a computer program product comprising computer program instructions, wherein the computer program instructions cause a computer to carry out the steps of any of the method embodiments described above. According to a further embodiment of the application, there is also provided a computer-readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run. According to a further embodiment of the application, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above. According to the embodiment of the application, the result feedback is obtained according to the model reasoning of the machine learning model for output analysis, and the rewards and/or states required by reinforcement learning are generated according to the result feedback, and the action feedback is obtained from the first entity according to the rewards and/or states, so that the problems that the core network does not support reinforcement learning related technology in related technology, a real-time analysis quality feedback mechanism is not available, model enhancement and analysis quality monitoring cannot be carried out, reinforcement learning related technology is supported in the core network, the quality feedback mechanism can be analyzed in real time in the core network, and model enhancement and analysis quality monitoring can also be carried out. Drawings FIG. 1 is a block diagram of a 5G system architecture including ADRF according to an embodiment of the application; FIG. 2 is a schematic diagram of a reinforcement learning algorithm according to an embodiment of the present application; FIG. 3 is a flowchart of a reinforcement learning processing method according to an embodiment of the present application; FIG. 4 is a second flowchart