CN-122020463-A - Multi-mode fusion-based senile abnormal behavior identification method and system

CN122020463ACN 122020463 ACN122020463 ACN 122020463ACN-122020463-A

Abstract

The invention discloses a method and a system for identifying abnormal behaviors of old people based on multi-modal fusion, wherein the method comprises the steps of synchronously acquiring vision, inertia and environment heterogeneous data by an edge end, extracting each modal characteristic after time alignment and preprocessing, dynamically correcting the weight of the vision and inertia characteristics through a cross-modal attention adapter to relieve modal unbalance, constructing a multi-modal fusion graph, executing a graph attention mechanism (HLGAtt) in a double Qu Luolun-z space, describing a behavior hierarchical relationship, detecting abnormality based on a graph attention network, positioning root cause nodes through random walk, and outputting abnormality categories and root cause interpretation. Compared with the prior art, the method has the advantages that the fraction of the abnormal identification F1 is improved by more than 6%, the false alarm rate is lower than 0.7 times/day, the method has the characteristics of high reliability and interpretation, can be widely applied to the fields of nursing homes, communities and families, and provides technical support for intelligent nursing.

Inventors

YUAN XIANG
JIANG ZENGSHI
ZHU BING
WU HAIBIN
LIU MENG
Yue Shuaibo

Assignees

福寿康智慧医疗养老服务(上海)有限公司

Dates

Publication Date: 20260512
Application Date: 20260126

Claims (10)

1. The method for identifying the abnormal behavior of the aged based on multi-mode fusion is characterized by comprising the following steps of: Collecting multi-mode data through a multi-mode data sensor and preprocessing the multi-mode data; Generating multi-modal features based on multi-modal data fusion; abnormality detection and abnormality cause localization based on multi-modal characteristics, and And verifying and analyzing the result.
2. The method of claim 1, wherein the acquiring and preprocessing multi-modal data comprises: For inertial sensor data, the dimensional influence is eliminated by the following formula: ; wherein X is original data, and mu and sigma are mean and standard deviation respectively; filling the missing value of the environmental sensor data by adopting a KNN interpolation method, so as to ensure the data continuity; Normalization and feature extraction are performed on the vision sensor data.
3. The method of claim 2, wherein the normalizing and feature extracting of the vision sensor data comprises: Normalization is performed by the following formula: ; Wherein the method comprises the steps of As a result of the normalization of the values, As the original value to be normalized, Is the maximum value in the original data set, Is the minimum in the original dataset; Mapping the pixel value to a [0,1] interval; Feature extraction is performed by the following formula: ; Wherein the method comprises the steps of In order to modify the linear cell activation function, Is a weight parameter of the convolution layer, For the normalized input features, Is a bias parameter for the convolutional layer.
4. The method of claim 2, further comprising aligning the sequences to process frequency differences of the multi-modal timing data using the formula: ; Wherein the method comprises the steps of For a dynamic time warping distance between sequence a and sequence B, In order to minimize all legal regular paths pi, pi is the legal regular path connecting the sequence A and the sequence B elements, For the i-th element of the sequence a, As the j-th element of the sequence B, Is the absolute difference of the two elements.
5. The method of claim 1, wherein generating the multi-modal feature based on multi-modal data fusion comprises: Given visual characteristics Audio features Wherein T is the number of time steps, D V and D A are feature dimensions; Splicing the key matrix K and the value matrix V of the audio features with the learnable parameters Pk and Pv: ; Computing cross-modal attention: ; Wherein the method comprises the steps of For spliced key matrix Is used in the manufacture of a printed circuit board, For normalizing the exponential function, for generating an attention weight; the cross-modal attention feature is then passed through the bottleneck adapter: ; Wherein the method comprises the steps of In order for the attention profile to be output, As a downsampling function for focusing attention features The dimension is reduced, the effect of reducing the dimension is realized, For gaussian error linear element activation functions, a nonlinear transformation is introduced, The method is an up-sampling function and is used for restoring the feature after dimension reduction to a target dimension; Dynamically adjusting the importance of audio features by a learnable weight: ; Wherein the method comprises the steps of A matrix of learnable parameters weighted for the modality, sigma is a sigmoid activation function, Is an original audio feature; the final multi-modal fusion is characterized by: ; Wherein the method comprises the steps of Is a full connection layer function used for carrying out dimension transformation or nonlinear mapping on the fusion characteristics, As a result of the original visual characteristics, A modal weighting factor generated for the audio feature.
6. The method of claim 1, further comprising constructing a multi-modal fusion map ; Wherein the node Representing modal feature vectors, edges Representing the inter-modality association strength; the calculated edge weights are as follows: ; Wherein the method comprises the steps of For the edge weights between node i and node j, For a set of neighborhood nodes of node i, The function is activated for a modified linear cell with leakage, For the attention vector, for calculating the association weights between features, For a learnable weight matrix, for linear transformation of node features, The eigenvectors of nodes i, j, k, Is an exponential function for mapping the activation value to a positive weight.
7. The method of claim 1, further comprising capturing a hierarchical relationship between normal and abnormal behavior via a double Qu Luolun z graph semantic force mechanism: Mapping the fusion features to hyperbolic space: ; wherein the Lorentzian inner product is calculated as: ; Wherein the method comprises the steps of Is an n-dimensional lorentz hyperboloid, For an n+1-dimensional real vector, the components are noted as x0, x1, xn, In the form of the lorentz inner product, To constrain the upper half of the hyperboloid; the force mechanism of double Qu Luolun drawing is: ; Wherein the method comprises the steps of Hyperbolic spatial feature vectors after neighborhood features are aggregated for node i, For hyperbolic index mapping centered on the hyperbolic space origin 0 and having a curvature parameter c, for mapping tangent space vectors back to the hyperbolic space, For hyperbolic logarithmic mapping with the hyperbolic space origin 0 as the center and the curvature parameter c, the function is to map the hyperbolic space vector to the tangent space of the origin, For the attention weight of node i to neighborhood node j, For a set of neighborhood nodes of node i, The original hyperbolic space feature vector of the neighborhood node j; Learning different modes through HLGAtt parallel branch processing, wherein a node update formula is as follows: ; ; ; Wherein, the For the hierarchical indexing of the network, Is the Lorentz space eigenvector of the layer i node of the 1-1, To adapt the linear transformation function of the lorentz geometry, As an intermediate feature of the first layer node i after linear transformation, As an attention function for lorentz space, For the attention weight of layer i to neighborhood node j, For the feature aggregation function in Lorentz space, based on attention weight fusion neighborhood features, For a set of neighborhood nodes of node i, And the final Lorentz space characteristics after aggregation of the first layer of nodes i.
8. The method of claim 1, wherein the anomaly detection and locating an anomaly cause based on the multi-modal characteristics comprises: performing anomaly detection based on the graph attention network; updating node characteristics by convolving and aggregating neighborhood information through the multi-layer graph: ; Wherein the method comprises the steps of After the node i is calculated by the first layer, the output first layer+1 characteristic vector, In order to activate the function, For a set of neighborhood nodes of node i, For the attention weight of node i in layer i to neighborhood node j, Is a learnable weight matrix of the first layer and is used for carrying out linear transformation on the characteristics of the neighborhood nodes, The input feature vector of the neighborhood node j in the first layer; Adopting a semi-supervised learning framework to jointly optimize supervised cross entropy loss and unsupervised contrast loss: ; Wherein the method comprises the steps of For the total loss function of the model training, For the balance super-parameters of the loss terms, for adjusting the contribution ratio of the different losses in the total loss, In order for the cross-entropy loss to occur, In order for the loss to be regularized, Is a contrast loss; Similar behavioral features are close, and abnormal features are far away: ; Wherein the method comprises the steps of The method comprises the steps that P is a positive sample set corresponding to an anchor sample, N is a negative sample set corresponding to the anchor sample, the anchor sample set and the anchor sample belong to different classes, sim is a similarity function, tau is a temperature parameter used for adjusting the smoothness of similarity distribution, tau is smaller, the distinction is stronger, exp is an exponential function used for mapping a similarity value into positive weight; After abnormality is detected, locating an abnormality root cause on the multi-mode fusion map; defining the transition probability from node i to j as: ; Wherein the method comprises the steps of For the transition probability from node i to node j, the value range is 0,1, For the edge weight between the node i and the node j, the association strength of the two nodes is represented, As a set of neighbor nodes for node i, The sum of the edge weights of the node i and all neighbor nodes is obtained; and counting access frequencies of all nodes by multiple random walks, and generating root cause weight distribution: ; Wherein the method comprises the steps of Scoring the root cause of the node v, measuring the likelihood of the node being the root cause of the fault, The number of times the node V is accessed, V is the total node set on the multi-modal fusion graph, The total access times of all the nodes are used for carrying out normalization processing on the access times of a single node; After the weights are ordered, the first K main root causes are output.
9. The method of claim 1, further comprising analyzing the long-term behavior pattern: Modeling the daily behaviors of the old as a hidden state sequence, wherein the hidden state represents a behavior mode, and the observed value is multi-mode sensor data; Given an observation sequence Likelihood probabilities are calculated by forward algorithm: ; Wherein, the To observe the sequence In the model The likelihood probability of the lower one is that, As a vector of the probability of the initial state, To be the probability that the initial moment is in state q 1 , As a matrix of state transition probabilities, For the probability of transitioning from state q t−1 to q t , In order to observe the transmission probability matrix, The probability of observing o t is generated for state q t , A hidden state sequence with a length of T; When the normal mode is deviated, an alarm is given to the user.
10. An abnormal behavior recognition system for old people based on multi-modal fusion, which is characterized by comprising: a data acquisition module configured to synchronously acquire vision, inertial and environmental sensor data; The preprocessing module is configured to perform time alignment, normalization, missing value filling and feature extraction and output a multi-mode feature vector; a cross-modal attention adapter module configured to dynamically weight fuse the multi-modal feature vectors, generating an adaptive fusion feature; the diagram construction module is configured to construct a multi-mode fusion diagram by taking the self-adaptive fusion characteristics as nodes and the similarity among modes as the side weight; A double Qu Luolun-diagram semantic force mechanism module configured to update node characterization in hyperbolic space, outputting hyperbolic fusion features; the anomaly detection module is configured to classify the hyperbolic fusion characteristics based on a graph attention network to obtain an anomaly behavior class; A root cause positioning module configured to calculate and output an abnormality root cause using a random walk algorithm when an abnormality occurs, and And the early warning interface module is configured to push the abnormal category and root cause information to the monitoring terminal.

Description

Multi-mode fusion-based senile abnormal behavior identification method and system Technical Field The invention relates to the technical field of intelligent endowment and health monitoring, in particular to a method and a system for identifying abnormal behaviors of old people based on multi-mode fusion. Background The global population aging degree is continuously deepened, and abnormal behaviors (such as falling, medication omission, disturbance of work and rest and the like) of the aged in the daily activities become key causes for causing major health and safety events, so that the method realizes accurate and efficient monitoring of the abnormal behaviors of the aged, and has important significance for guaranteeing the health and safety of the aged and promoting the development of the intelligent care industry. The traditional old person abnormal behavior monitoring scheme mostly adopts a single-mode sensing technology, and particularly can rely on a single sensing device such as a visual camera or a wearable accelerometer to realize monitoring. However, the application limitation of the single-mode monitoring scheme is remarkable, under the actual endowment scene such as abrupt change of illumination conditions, shielding of a monitoring target, existence of a blind area in a monitoring view angle, irregular wearing of the wearable equipment, and the like, the detection accuracy of abnormal behaviors can be rapidly reduced, the problems of false alarm, missing report and the like are extremely easy to occur, and the all-weather and high-reliability monitoring requirements under the endowment scene are difficult to meet. In recent years, a multi-mode fusion technology is gradually introduced into the field of abnormal behavior recognition of the elderly, and the technology can effectively improve the robustness of abnormal behavior monitoring on a theoretical level by combining multi-source heterogeneous information such as visual sensing data, inertial sensing data, environmental sensing data and the like. However, the existing multi-mode fusion method has obvious technical defects, most of the existing multi-mode fusion method adopts a simple fusion mode of early feature splicing or late decision fusion, the core technical problems of sampling frequency difference, information weight distribution unbalance, time sequence feature asynchronism and the like among heterogeneous mode data cannot be solved pertinently, the fusion efficiency of the multi-mode data is low, the discrimination capability of the fused features is insufficient, and the complementary advantages of the multi-source data cannot be fully exerted. In addition, the current mainstream abnormal behavior recognition model generally builds a deep learning network based on Euclidean space and completes feature operation and modeling, the evolution process of the behaviors of the old in normal-sub-abnormal mode has natural hierarchical structural features, the linear modeling mode of the Euclidean space is difficult to accurately describe the nonlinear hierarchical behavior features, meanwhile, the existing model can only output class labels of abnormal behaviors, specific sensing data sources or environment influence factors which induce the abnormal behaviors cannot be traced, and the model has serious interpretation defects. The technical defects greatly limit the large-scale deployment, popularization and application of the related abnormal behavior monitoring technology in a real endowment scene. To sum up, there is a need in the art to develop a technical solution for monitoring the abnormal behavior of the end-to-end elderly people, which can adaptively balance the contribution weight of multi-modal data, complete the modeling of behavior characteristics in a space with hierarchical expression capability, and accurately locate the cause of an abnormal root, so as to break through the application bottleneck in the prior art. Disclosure of Invention The method comprises the steps of synchronously collecting three types of heterogeneous data of vision, inertia and environment at an edge end, extracting each modal characteristic after time alignment and preprocessing, dynamically correcting the weights of the vision and inertia characteristics through a cross-modal attention adapter (CFA) to relieve imbalance of modal information, constructing the enhanced characteristics into a multi-modal fusion graph, executing a graph attention mechanism (HLGAtt) in a double Qu Luolun-z space to describe the hierarchical relationship between normal and abnormal behaviors, completing abnormality detection based on a graph attention network, positioning root cause nodes causing abnormality on the fusion graph by adopting a random walk algorithm, and finally outputting abnormal types and root cause interpretation. The invention provides a multi-mode fusion-based senile abnormal behavior identification method, which comprises the following steps: Collec