CN-121977274-A - Air conditioner, control method thereof, storage medium, and program product

CN121977274ACN 121977274 ACN121977274 ACN 121977274ACN-121977274-A

Abstract

The application relates to an air conditioner and a control method, a storage medium and a program product thereof, wherein the method comprises the steps of obtaining set parameters of the air conditioner and state information of a target user, wherein the state information comprises behavior parameters, physiological parameters and environment parameters, determining weight coefficients corresponding to a plurality of optimization targets in a first learning model based on a pre-trained weight distribution model according to the set parameters and the state information, wherein the plurality of optimization targets comprise comfort of the target user and at least one of adjustment precision of the air conditioner and energy consumption level of the air conditioner, and inputting the set parameters, the state information and the weight coefficients into the first learning model to obtain a control instruction for controlling operation of the air conditioner. The method can enable the air conditioner to accurately respond to the requirements of target users in different scenes, thereby improving user experience.

Inventors

RONG GUANGWEI
LAN KEWEI
Xin Kaikai
GUAN ZHENBIN

Assignees

极凛科技(上海)有限公司

Dates

Publication Date: 20260505
Application Date: 20260331

Claims (10)

1. A control method of an air conditioner, the method comprising: Acquiring set parameters of the air conditioner and state information of a target user, wherein the state information comprises behavior parameters, physiological parameters and environment parameters; Determining weight coefficients corresponding to a plurality of optimization targets in a first learning model based on a pre-trained weight distribution model according to the set parameters and the state information, wherein the plurality of optimization targets comprise comfort of a target user and at least one of adjustment precision of the air conditioner and energy consumption level of the air conditioner; And inputting the set parameters, the state information and the weight coefficient into the first learning model to obtain a control instruction for controlling the operation of the air conditioner.
2. The method according to claim 1, wherein the first learning model is a reinforcement learning model, and wherein the reward function R of the first learning model is expressed by the following formula: The method comprises the steps of determining a comfort level rewarding value of a target user, wherein RC is the comfort level rewarding value of the target user, W1 is a weight coefficient corresponding to the comfort level rewarding value, RA is an adjusting precision rewarding value of the air conditioner, W2 is a weight coefficient corresponding to the adjusting precision rewarding value, RE is an energy consumption level rewarding value of the air conditioner, and W3 is a weight coefficient corresponding to the energy consumption level rewarding value.
3. The method according to claim 2, wherein the weight distribution model is a depth deterministic strategy gradient model, and the training goal of the depth deterministic strategy gradient model is to maximize the cumulative prize value of the prize function R.
4. The method according to claim 2, wherein the behavior parameters include an active adjustment command, the physiological parameters include a current PMV comfort index, and the comfort prize value RC is determined based on the following formula: Wherein RF adjusts the reward value for the user, N1 is a first preset weight, PMV_A is the current PMV comfort index, PMV_T is a preset target PMV comfort index, and N2 is a second preset weight.
5. The control method of an air conditioner according to claim 4, further comprising: acquiring a first historical adjustment instruction of the target user, wherein the first historical adjustment instruction is a historical adjustment instruction which is closest to the active adjustment instruction in time within a preset time period in the past; Comparing the active adjusting instruction with the first historical adjusting instruction to obtain a plurality of adjusting amplitudes of the active adjusting instruction, wherein each adjusting amplitude is a relative variable quantity between each setting parameter of the active adjusting instruction and a corresponding setting parameter in the historical adjusting instruction; if at least one of the adjustment amplitudes is greater than or equal to a preset adjustment amplitude, determining the user adjustment rewards value according to a preset adjustment rewards value calculation rule and the adjustment amplitudes, wherein the user adjustment rewards value is positively related to the adjustment amplitude; And if the adjustment amplitude is smaller than the preset adjustment amplitude, setting the user adjustment reward value as a first preset reward value, wherein the first preset reward value is smaller than zero.
6. The method according to any one of claims 2 to 5, wherein the set parameters of the air conditioner include a set temperature, a set wind speed, and a set wind direction, the environmental parameters include a current environmental temperature, a current wind speed of the air conditioner, and a current wind direction of the air conditioner, and the adjustment accuracy bonus value is determined based on the following formula: Wherein RA is the adjustment precision rewarding value, delta_T is the difference value between the set temperature and the current environment temperature, T0 is a preset reference temperature, delta_V is the difference value between the set wind speed and the current wind speed, V0 is a preset reference wind speed, delta_D is the angle difference value between the set wind direction and the current wind direction, and D0 is a preset reference angle.
7. An air conditioner, the air conditioner comprising: the system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring set parameters of the air conditioner and state information of a target user, wherein the state information comprises behavior parameters, physiological parameters and environment parameters; the first determining module is used for determining weight coefficients corresponding to a plurality of optimization targets in a first learning model based on a pre-trained weight distribution model according to the set parameters and the state information, wherein the plurality of optimization targets comprise comfort of the target user and at least one of adjustment precision of the air conditioner and energy consumption level of the air conditioner; And the second determining module inputs the setting parameters, the state information and the weight coefficients into the first learning model to obtain a control instruction for controlling the operation of the air conditioner.
8. An air conditioner comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of claims 1 to 6.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

Description

Air conditioner, control method thereof, storage medium, and program product Technical Field The present application relates to the field of air conditioning technologies, and in particular, to an air conditioner, a control method thereof, a storage medium, and a program product. Background With the development of artificial intelligence technology, intelligent control of air conditioners is gradually developed from single objective to multi-objective synchronous optimization. Specifically, some air conditioners assign fixed weights to multiple optimization objectives to train an artificial intelligence model. When the intelligent air conditioner works actually, the air conditioner inputs the acquired state information such as the physiological parameters of the user, the environmental parameters and the like, the trained artificial intelligent model is input, and the operation of the air conditioner is controlled according to the air conditioner instruction output by the artificial intelligent model, so that multi-objective optimization is realized. However, in different scenes, the demand emphasis points of the target users on the plurality of different optimization targets are not the same, and the fixed weight distribution mode is difficult to flexibly adapt to the change of the demand emphasis points, so that the air conditioner is difficult to accurately respond to the user demands in different scenes, and the user experience is poor. Disclosure of Invention Based on the above, the control method of the air conditioner provided by the application can enable the air conditioner to accurately respond to the requirements of target users in different scenes, thereby improving the user experience. The application provides a control method of an air conditioner, which comprises the steps of obtaining set parameters of the air conditioner and state information of a target user, wherein the state information comprises behavior parameters, physiological parameters and environment parameters, determining weight coefficients corresponding to a plurality of optimization targets in a first learning model based on a pre-trained weight distribution model according to the set parameters and the state information, wherein the plurality of optimization targets comprise comfort of the target user and at least one of adjustment precision of the air conditioner and energy consumption level of the air conditioner, and inputting the set parameters, the state information and the weight coefficients into the first learning model to obtain control instructions for controlling operation of the air conditioner. Optionally, the first learning model is a reinforcement learning model, and the reward function R of the first learning model is represented by the following formula: The method comprises the steps of determining a comfort level rewarding value of a target user, wherein RC is the comfort level rewarding value of the target user, W1 is a weight coefficient corresponding to the comfort level rewarding value, RA is an adjusting precision rewarding value of the air conditioner, W2 is a weight coefficient corresponding to the adjusting precision rewarding value, RE is an energy consumption level rewarding value of the air conditioner, and W3 is a weight coefficient corresponding to the energy consumption level rewarding value. Optionally, the weight distribution model is a pre-trained depth deterministic strategy gradient model, and the training target of the depth deterministic strategy gradient model is the cumulative reward value of the maximum reward function R. Optionally, the behavior parameters include an active adjustment instruction, the physiological parameters include a current PMV comfort index, and the comfort rewards value RC is determined based on the following formula: Wherein RF adjusts the reward value for the user, N1 is a first preset weight, PMV_A is a current PMV comfort index, PMV_T is a preset target PMV comfort index, and N2 is a second preset weight. The method comprises the steps of obtaining a first historical adjustment instruction of a target user, wherein the first historical adjustment instruction is a historical adjustment instruction which is closest to an active adjustment instruction in time in a preset time period in the past, comparing the active adjustment instruction with the first historical adjustment instruction to obtain a plurality of adjustment amplitudes of the active adjustment instruction, wherein each adjustment amplitude is a relative variable quantity between each set parameter of the active adjustment instruction and a corresponding set parameter in the historical adjustment instruction, determining a user adjustment rewarding value according to a preset adjustment rewarding value calculation rule if at least one adjustment amplitude is greater than or equal to the preset adjustment amplitude, determining the user adjustment rewarding value according to the plurality of adju