CN-122019118-A - Model reasoning method and related device

CN122019118ACN 122019118 ACN122019118 ACN 122019118ACN-122019118-A

Abstract

The application provides a model reasoning method and a related device, which are applied to the technical field of artificial intelligence. Compared with the prior art that the terminal side and the network equipment side respectively need to be preloaded with the structures of the models used in the multiple model segmentation modes and the union of the model parameter sets corresponding to the structures of the models, when the uplink bandwidth changes, the structure of the appropriate model and the model parameter corresponding to the structure of the model are selected for use.

Inventors

YUAN XIAOSONG

Assignees

华为技术有限公司

Dates

Publication Date: 20260512
Application Date: 20241112

Claims (20)

1. A method of model reasoning applied to a first communication device, the method comprising: Downloading a structure of a first model and a model parameter set corresponding to the structure of the first model, wherein the first model is used for end network collaborative reasoning service, and the structure of the first model is a union of structures of models used by the first communication device in a plurality of model segmentation modes; Determining a first model segmentation mode corresponding to a first time period according to the current uplink bandwidth, wherein the first model segmentation mode is related to a model parameter set corresponding to a structure of the first model; Loading a first model parameter corresponding to the first model segmentation mode, wherein the first model parameter belongs to the model parameter set, and the first model parameter is used for carrying out the end network collaborative reasoning service; Circularly executing a first operation until the end network collaborative reasoning service is finished within a preset time period, wherein the preset time period consists of at least one time period, and the first time period belongs to the at least one time period; wherein the first operation comprises: Determining a performance indicator associated with the first communication device according to the battery power of the first communication device, the storage space remaining amount, the central processing unit load, the graphic processing unit load and the temperature of the first communication device, wherein the performance indicator is related to the uplink transmission rate; Transmitting parameter information to a second communication device, wherein the parameter information comprises channel quality indicator CQI, the end network cooperates with the service quality requirement of an inference service, the inference output dimension, the inference frequency and the performance indicator of the first communication device, and the parameter information is used for predicting the uplink bandwidth of the first communication device by the second communication device or a third communication device; Determining a second model segmentation mode corresponding to the next time period of the first time period, wherein the second model segmentation mode is related to a model parameter set corresponding to the first model; and determining a model parameter processing strategy according to the second model segmentation mode.
2. The method according to claim 1, wherein determining a second model segmentation mode corresponding to a time period next to the first time period comprises: receiving an upstream bandwidth prediction range from a second communication device; and determining a second model segmentation mode corresponding to the next time period of the first time period according to the uplink bandwidth prediction range.
3. The method according to claim 1, wherein determining a second model segmentation mode corresponding to a time period next to the first time period comprises: And receiving a second model segmentation mode corresponding to the next time period of the first time period forwarded by the second communication device.
4. A method according to any one of claim 1 to 3, wherein, The loop performs a first operation comprising: And in the case that the trigger condition is met, performing the first operation in a circulating way.
5. The method according to any one of claims 1-4, wherein determining a model parameter processing policy according to the second model segmentation approach comprises: And loading the model structure with the increased second model segmentation mode relative to the first model segmentation mode and the model parameters corresponding to the increased model structure in the first time period, and/or unloading the model structure with the decreased second model segmentation mode relative to the first model segmentation mode and the model parameters corresponding to the decreased model structure in the next time period of the first time period.
6. The method according to any one of claims 1-4, wherein determining a model parameter processing policy according to the second model segmentation approach comprises: loading model parameters corresponding to the second model segmentation mode in the first time period, and/or running the end network collaborative reasoning service on the model parameters corresponding to the second model segmentation mode in the next time period of the first time period, and unloading the model parameters corresponding to the first model segmentation mode.
7. A model reasoning method applied to a second communication device, the method comprising: Circularly executing a second operation until the end of the intranet collaborative reasoning service within a preset time period, wherein the preset time period consists of at least one time period; Wherein the second operation comprises: Receiving parameter information from the first communication device, wherein the parameter information comprises channel quality indicator CQI, a service quality requirement of an inference service, an inference output dimension of the first communication device, an inference frequency, and the performance indicator, and the parameter information is used for predicting an uplink bandwidth of the first communication device by the second communication device or a third communication device; Measuring and obtaining channel quality index information of the first communication device, wherein the channel quality index information comprises a signal interference plus noise ratio (SINR), a Reference Signal Receiving Quality (RSRQ) and an uplink receiving signal strength indicator (UL_RSSI); and sending an uplink bandwidth prediction range or a second model segmentation mode to the first communication device.
8. The method of claim 7, wherein the sending the upstream bandwidth prediction horizon or the second model cut to the first communication device comprises: Obtaining the uplink bandwidth prediction range according to the load of the second communication device, the parameter information and the channel quality index information in a first time period, wherein the first time period belongs to the at least one time period; and transmitting the uplink bandwidth prediction range to the first communication device.
9. The method of claim 8, wherein the obtaining the upstream bandwidth prediction horizon based on the load of the second communication device, the parameter information, and the channel quality indicator information during the first time period comprises: And in a first time period, inputting the load of the second communication device, the parameter information and the channel quality index information into a prediction model to obtain the uplink bandwidth prediction range.
10. The method of claim 7, wherein the sending the upstream bandwidth prediction horizon or the second model cut to the first communication device comprises: forwarding the parameter information, the channel quality index information and the load of the second communication device to the third communication device; receiving an upstream bandwidth prediction range from the third communication device; Forwarding the upstream bandwidth prediction range to the first communication device.
11. The method of claim 7, wherein the sending the upstream bandwidth prediction horizon or the second model cut to the first communication device comprises: forwarding the parameter information, the channel quality index information and the load of the second communication device to the third communication device; receiving a second model segmentation mode from the third communication device; And forwarding the second model segmentation mode to the first communication device.
12. A model reasoning method applied to a third communication device, the method comprising: circularly executing a third operation until the end of the intranet collaborative reasoning service within a preset time period, wherein the preset time period consists of at least one time period; wherein the third operation includes: Receiving parameter information, channel quality index information and a load of a second communication device from the second communication device, wherein the parameter information comprises channel quality indicator CQI, a service quality requirement of an inference service, an inference output dimension of the first communication device, an inference frequency, and the performance indicator, the parameter information is used for predicting an uplink bandwidth of the first communication device by the third communication device, and the channel quality index information comprises a signal interference plus noise ratio SINR, a reference signal receiving quality RSRQ and an uplink receiving signal strength indicator UL_RSSI; Determining an uplink bandwidth prediction range or a second model segmentation mode corresponding to a next time period of the first time period according to parameter information of the second communication device, the channel quality index information and the load of the second communication device, wherein the first time period belongs to the at least one time period; And sending the uplink bandwidth prediction range or the second model segmentation mode to the second communication device.
13. The method of claim 12, wherein the determining, according to the parameter information of the second communication device, the channel quality index information, and the load of the second communication device, the uplink bandwidth prediction range or the second model slicing manner corresponding to the next time period of the first time period includes: And inputting the load of the second communication device, the parameter information and the channel quality index information into a prediction model to obtain the uplink bandwidth prediction range.
14. The method of claim 12, wherein the determining, according to the parameter information of the second communication device, the channel quality index information, and the load of the second communication device, the uplink bandwidth prediction range or the second model slicing manner corresponding to the next time period of the first time period includes: inputting the load of the second communication device, the parameter information and the channel quality index information into a prediction model to obtain the uplink bandwidth prediction range; receiving a maximum computing power resource of a first communication device, wherein the maximum computing power resource is used for an end network collaborative reasoning service; And determining a second model segmentation mode corresponding to the next time period of the first time period according to the uplink bandwidth prediction range and the maximum computing power resource.
15. Model reasoning apparatus, characterized in that it comprises a communication unit and a processing unit for performing the method according to any of claims 1-6.
16. Model reasoning apparatus, characterized in that it comprises a communication unit and a processing unit for performing the method according to any of claims 7-11.
17. A model reasoning apparatus, characterized in that the model reasoning apparatus comprises a communication unit and a processing unit for performing the method of any of claims 12-14.
18. A model reasoning apparatus, characterized in that the model reasoning apparatus comprises a processor; When the processor invokes a computer program or instructions in memory, for causing the model inference means to implement the method as claimed in any one of claims 1-6.
19. A model reasoning apparatus, characterized in that the model reasoning apparatus comprises a processor; When the processor invokes a computer program or instructions in memory, for causing the model inference means to implement the method as claimed in any one of claims 7-11.
20. A model reasoning apparatus, characterized in that the model reasoning apparatus comprises a processor; When the processor invokes a computer program or instructions in memory, for causing the model inference means to implement the method as claimed in any one of claims 12-14.

Description

Model reasoning method and related device Technical Field The application relates to the technical field of artificial intelligence, in particular to a model reasoning method and a related device. Background With the development of artificial intelligence, applications based on artificial intelligence (ARTIFICIAL INTELLIGENCE, AI)/machine learning (MACHINE LEARNING, ML) model reasoning are emerging, such as image classification, speech recognition and autopilot, which require increasingly higher computational power to process increasingly complex AI/ML data and models. In addition, emerging media services such as Virtual Reality (VR), augmented Reality (AR), and cloud gaming are also rapidly evolving. These applications are highly dependent on AI/ML technology to implement their core functionality. For example, smart glasses-based AR media services require in-depth analysis of the surrounding environment, whereas AI/ML technology can provide the required environmental context information by performing object segmentation, recognition and classification, thereby enhancing the user experience. However, the enormous computational effort required for these applications often exceeds the computational power of the end-side, so that the AI/ML model is split between the end-side device and the network side to co-reasoning the end-network that distributes part or even most of the computation to the network side into a solution path. Part of model reasoning applications have quite high requirements for end-to-end reasoning delay, such as autopilot. When the problem of insufficient computing power resources at the end side is solved, the end network collaborative reasoning also meets the end-to-end time delay requirement of the reasoning application. In order to ensure that the reasoning service is not interrupted when the model is replaced due to the change of the uplink bandwidth, the current model reasoning application needs to load all sets of models in advance on the end side and the network side, so that a great amount of resources on the end side and the network side are wasted. Therefore, how to reduce the waste of computing resources at the end side and the network side while guaranteeing the end-to-end delay under the condition of the uplink channel bandwidth change by the end network collaborative reasoning service is a hotspot under study of the technicians in the field. Disclosure of Invention The application provides a model reasoning method and a related device, which can reduce the waste of computing resources at a terminal side and a network equipment side while guaranteeing end-to-end time delay. In a first aspect, the present application provides a model reasoning method, which may be applied to a first communication device, which may be, for example, a terminal or a module in the terminal (where the module in the terminal includes a communication module and a calculation module), or a circuit or a chip in the terminal that is responsible for a communication function (such as a modem (modem) chip, also called a baseband (baseband) chip, or a system on chip (SoC) chip or a system in package (SYSTEMIN PACKAGE, SIP) chip that includes a modem core. The method comprises the steps of downloading a structure of a first model and a model parameter set corresponding to the structure of the first model, wherein the first model is used for end network collaborative reasoning service, and the structure of the first model is a union of structures of models used by the first communication device in a plurality of model segmentation modes. And determining a first model segmentation mode corresponding to the first time period according to the current uplink bandwidth, wherein the first model segmentation mode is related to a model parameter set corresponding to the structure of the first model. Loading a first model parameter corresponding to the first model segmentation mode, wherein the first model parameter belongs to the model parameter set, and the first model parameter is used for carrying out the end network collaborative reasoning service. And circularly executing a first operation until the end network collaborative reasoning service is finished within a preset time period, wherein the preset time period consists of at least one time period, and the first time period belongs to the at least one time period. Wherein the first operation comprises determining a performance indicator associated with the first communication device based on the first communication device's own battery level, storage space remaining, central processor load, graphics processor load, temperature of the first communication device, wherein the performance indicator is related to an uplink transmission rate. And sending parameter information to a second communication device, wherein the parameter information comprises channel quality indication CQI, the end network cooperates with the service quality requirement of the reasoning ser