CN-122020424-A - Human indoor behavior intention reasoning method, system, electronic equipment and storage medium based on deep inverse reinforcement learning

CN122020424ACN 122020424 ACN122020424 ACN 122020424ACN-122020424-A

Abstract

The invention provides a human indoor behavior intention reasoning method based on deep inverse reinforcement learning, which comprises the steps of acquiring time sequence data of a target indoor environment in a history period; the time sequence data comprises environment state parameters, behavior data and context data, an implicit rewarding function model is built, deep reverse reinforcement learning training is conducted on the implicit rewarding function model through the time sequence data, and future behavior intention of a target object is predicted in an inference mode through the trained implicit rewarding function model. According to the method, the deep neural network with the factorization layer and the self-adaptive weight layer is constructed, and the loss function of behavior distribution matching and physical result alignment is combined, so that the combined prediction of future behaviors and air quality is realized, the problems that the generalization capability is poor, dynamic closed loops of people and environment cannot be processed in the traditional method are solved, and a core foundation is laid for accurate indoor air quality prediction and personalized control.

Inventors

LIU JINHUA
ZHAO PENG
NING ZHANWU
ZHANG YANNI
Sun Pi
LIU NING
LIU WEIJIE
JIA YITING
Ren Diefan

Assignees

北京市科学技术研究院城市安全与环境科学研究所

Dates

Publication Date: 20260512
Application Date: 20260131

Claims (9)

1. The human indoor behavior intention reasoning method based on deep inverse reinforcement learning is characterized by comprising the following steps of: Acquiring time sequence data of a target indoor environment in a history period, wherein the time sequence data comprises environment state parameters, behavior data and context data; constructing an implicit rewarding function model; performing deep reverse reinforcement learning training on the implicit rewarding function model through the time sequence data; and carrying out inference prediction on the future behavior intention of the target object through the trained implicit rewarding function model.
2. The method for reasoning human indoor behavior intentions based on deep inverse reinforcement learning of claim 1, wherein the implicit rewarding function model comprises a habit encoder for mapping the context data into a personal habit embedded vector, a factoring network for generating a sub rewarding value according to the time series data, the personal habit embedded vector and an environmental factor, and an adaptive weight network for generating an environmental factor weight according to the environmental state parameter and the personal habit embedded vector.
3. The deep reverse reinforcement learning based human indoor behavioral intention inference method of claim 2, wherein the factoring network comprises a plurality of parallel sub-rewards networks, each of the sub-rewards networks acting on one of the environmental factors, the sub-rewards networks being for generating the sub-rewards values of the environmental factors.
4. The method for reasoning human indoor behavior intention based on deep inverse reinforcement learning as set forth in claim 2, wherein the environmental factors include temperature, humidity, carbon dioxide concentration, TVOC concentration, formaldehyde concentration, fine particulate matter concentration, illuminance and noise.
5. The method for reasoning human indoor behavior intentions based on deep reverse reinforcement learning according to claim 1, wherein deep reverse reinforcement learning training of the implicit bonus function model by the time series data comprises: Training the implicit rewarding function model in a simulation environment through a forward reinforcement learning algorithm to obtain a random strategy, wherein the simulation environment is constructed by a physical dynamics model, and the physical dynamics model is constructed based on mass conservation and a heat balance equation; Generating a random track in the simulation environment according to the random strategy; calculating according to the random track to obtain a first loss; calculating to obtain a second loss according to the physical dynamics model and the time sequence data; constructing a total loss function according to the first loss and the second loss; Updating the reward function of the implicit reward function model by a gradient descent method based on the total loss function.
6. The human indoor behavioral intention inference method based on deep inverse reinforcement learning of claim 5, wherein the inference prediction of the future behavioral intention of the target object by the trained implicit reward function model comprises: Inputting the time sequence data at the current moment as a trained implicit rewarding function model to obtain probability distribution; sampling the probability distribution to obtain predicted instant behaviors; Predicting the time sequence data at the current moment and the instant behavior through the physical dynamics model to obtain the environmental state at the next moment; and replacing the environmental state parameter with the environmental state at the next moment and repeatedly predicting through the physical dynamics model to obtain an environmental state prediction sequence.
7. A human indoor behavioral intention reasoning system based on deep inverse reinforcement learning, comprising: the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring time sequence data of a target indoor environment in a history period, and the time sequence data comprises environment state parameters, behavior data and context data; The model building module is used for building an implicit rewarding function model; the model training module is used for carrying out deep reverse reinforcement learning training on the implicit rewarding function model through the time sequence data; And the behavior reasoning module is used for carrying out reasoning prediction on the future behavior intention of the target object through the trained implicit rewarding function model.
8. An electronic device, comprising: At least one processor; a memory communicatively coupled to the at least one processor; The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the behavioral intention inference method of any one of claims 1 to 7.
9. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the behavioral intention inference method of any one of claims 1 to 7.

Description

Human indoor behavior intention reasoning method, system, electronic equipment and storage medium based on deep inverse reinforcement learning Technical Field The invention relates to the technical field of environment perception, in particular to a human indoor behavior intention reasoning method, a system, electronic equipment and a storage medium based on deep inverse reinforcement learning. Background Accurate indoor environment prediction and control are the core for building healthy, comfortable and energy-saving intelligent space. The fundamental challenge is how to model human behavior as a dynamic variable of the system core. However, the prior art has significant evolution path and fundamental limitation on behavior processing: 1. the existing method is too dependent on statistical rules of behavior history data, such as prediction by mining frequent patterns of a user behavior sequence, does not input an environment state as a model, can only predict large-probability behaviors in a stable environment, and is completely invalid in prediction and lacks adaptability when environmental conditions such as indoor air quality, temperature and humidity change. 2. The existing methods are used for improving robustness, environmental data are introduced, but fusion layers are shallow, such as simply splicing the environmental features and the behavior features and inputting a prediction model, or taking the environmental parameters as a state transition triggering condition of a preset rule, but the methods only consider the environment as a static association feature or a hard rule condition, cannot establish a dynamic and causal interaction model between a behavior decision and an environmental state, cannot explain the environmental influence behavior, and have weak generalization capability when the environment is suddenly changed or a new scene is faced. 3. The system strategy can be dynamically adjusted according to real-time environment parameters, but the behavior logic of the system strategy is an optimization function which is preset by engineers and takes system energy efficiency as a center, and is not a human true intention model which is inverted from data and takes personal comfort as a center, so that personalized service is lost. 4. Although the recent reverse reinforcement learning (IRL) can act on structured tasks with clear targets and clear rules, indoor human comfort pursues the problem of unstructured preference of target subjectivity, multidimensional competition and high individuality, and the existing IRL framework cannot be directly used. Disclosure of Invention The invention aims to provide a human indoor behavior intention reasoning method, a system, electronic equipment and a storage medium based on deep inverse reinforcement learning, which are used for realizing joint and dynamic prediction of future human behavior intention and indoor air quality state by constructing an implicit rewarding function. In order to achieve the above object, the present invention provides the following solutions: a human indoor behavior intention reasoning method based on deep inverse reinforcement learning comprises the following steps: Acquiring time sequence data of a target indoor environment in a history period, wherein the time sequence data comprises environment state parameters, behavior data and context data; constructing an implicit rewarding function model; deep reverse reinforcement learning training is carried out on the implicit rewarding function model through time sequence data; and carrying out inference prediction on the future behavior intention of the target object through the trained implicit rewarding function model. Optionally, the implicit reward function model comprises a habit encoder for mapping the context data to a personal habit embedded vector, a factoring network for generating sub-reward values from the time series data, the personal habit embedded vector and the environmental factor, and an adaptive weighting network for generating environmental factor weights from the environmental state parameters and the personal habit embedded vector. Optionally, the factoring network comprises a plurality of parallel sub-rewards networks, each sub-rewards network acting on an environmental factor, the sub-rewards networks being for generating sub-rewards values of the environmental factor. Optionally, the environmental factors include temperature, humidity, carbon dioxide concentration, TVOC concentration, formaldehyde concentration, fines concentration, illuminance, and noise. Optionally, performing deep reverse reinforcement learning training on the implicit rewards function model by time series data includes: training an implicit rewarding function model in a simulation environment through a forward reinforcement learning algorithm to obtain a random strategy, wherein the simulation environment is constructed by a physical dynamics model, and the physical dynamics model