CN-122025125-A - Double-visual-angle autism detection method based on asynchronous brain function priori

CN122025125ACN 122025125 ACN122025125 ACN 122025125ACN-122025125-A

Abstract

The invention relates to the technical field of computer-aided diagnosis, in particular to a double-visual-angle autism detection method and system based on asynchronous brain function priori. The method comprises the steps of firstly obtaining an original fNIRS blood oxygen data sequence and double-view video data of a tested experiment task, constructing a global neural encoder by adopting a depth one-dimensional convolution architecture similar to VGG, extracting space-time characteristics of brain functions to generate global neural characteristic vectors, constructing a cross-modal channel attention generation network, generating channel attention weight vectors, injecting the channel attention weight vectors into a self-attention module extracted from video characteristics to perform channel-level dynamic calibration, calculating difference characterization vectors by adopting a time-lag perceived double-flow characteristic alignment module, and outputting prediction results of autism spectrum disorders. According to the invention, random action noise irrelevant to pathology in a video can be dynamically restrained on the premise of no strict time synchronization, and behavior defect characteristics relevant to nerve abnormality are amplified.

Inventors

LIU YONGJIN
JI WENQI

Assignees

清华大学

Dates

Publication Date: 20260512
Application Date: 20260126

Claims (10)

1. A method for detecting double-view autism based on asynchronous brain function prior, comprising the steps of: s1, acquiring an original fNIRS blood oxygen data sequence and double-view video characteristic data of a tested test during an experimental task as input information of a global nerve encoder; s2, constructing a global neural encoder by adopting a depth one-dimensional convolution architecture similar to VGG, extracting space-time characteristics of brain functions through a convolution pooling structure of a layer-by-layer laminated time dimension, and generating a global neural characteristic vector through global average pooling; S3, constructing a cross-modal channel attention generation network, mapping the global neural feature vector to a space matched with the video feature channel dimension, and generating a channel attention weight vector through a Sigmoid activation function; S4, injecting the channel attention weight vector into a self-attention module for extracting video features, and carrying out channel-level dynamic calibration on video feature codes of the tested child visual angles based on the channel attention weight vector; S5, calculating a difference characterization vector based on the calibrated tested visual angle video features and the main test visual angle video features by adopting a time-lag sensing double-flow feature alignment module, and outputting a prediction result of the autism spectrum disorder through a classifier.
2. The method of claim 1, wherein the global neural encoder comprises: the convolution block group consists of at least two convolution blocks, wherein each convolution block comprises a one-dimensional convolution layer, a batch normalization layer and an ELU activation function, and adjacent convolution blocks are connected through a maximum pooling layer.
3. The method of claim 1, wherein S3 further comprises: S31, a multi-layer perceptron is adopted to construct a mapping network from the global nerve characteristic vector to the video characteristic encoder, and the global nerve characteristic vector is projected to a hidden layer channel for video characteristic encoding The same dimension is adopted to obtain a projection result; s32, inputting the projection result into a Sigmoid activation function to generate a channel attention weight vector.
4. The method of claim 1, wherein S4 further comprises: s41, respectively carrying out video feature coding on a double-view feature sequence by utilizing an independent linear embedded layer and an independent multi-head self-attention module, wherein the double-view feature sequence comprises feature sequences of a main test view angle and a tested child view angle; S42, weighting the characteristic sequence of the tested child view angle to realize channel-level dynamic calibration of the double-view angle video characteristic sequence, wherein the weighting formula is as follows: ; Wherein, the For a characteristic sequence of the view angles of the children under test, The channel attention weight vector.
5. The method of claim 4, wherein S5 further comprises: s51, processing the characteristic sequence codes of the main test view angles and the characteristic sequences of the tested child view angles after weighting processing by adopting a limited cross-sequence attention mechanism to obtain difference characterization vectors of the main test view angles and the tested child view angles; s52, processing the difference characterization vector by using linear projection and attention pooling processing, and outputting the prediction probability of the sample belonging to the autism spectrum disorder or the tic disorder by using a classifier formed by the full-connection layers through a Softmax function.
6. The method of claim 5, wherein the method for obtaining the difference characterization vector of the master child and the tested child in step S51 is as follows: calculating an attention matrix in a local time window by taking the weighted characteristic sequence of the tested child visual angle as a query and taking the double-flow characteristic sequence of the main visual angle as a key/value, generating aligned main test characteristics and explicit action time lag characteristics, and further obtaining a difference characterization vector of the main test and the tested child : Wherein, the To be a master test feature after alignment, Is an explicit action time lag feature.
7. The method of claim 1, wherein the dual view video feature data comprises a sequence of skeletal keypoints, optical flow features, or gesture heat map features extracted from pre-processed simulated motion samples of a main view and a tested view.
8. A dual view autism detection system based on asynchronous brain function priors, comprising: The data acquisition module is used for acquiring an original fNIRS blood oxygen data sequence and double-view video data of a tested test during an experimental task and taking the original fNIRS blood oxygen data sequence and the double-view video data as input information of the global nerve encoder; The global neural feature extraction module is used for constructing a global neural encoder by adopting a depth one-dimensional convolution architecture similar to VGG, extracting space-time features of brain functions through a convolution pooling structure of a layer-by-layer laminated time dimension, and generating a global neural feature vector through global average pooling; the cross-modal channel attention generation module is used for constructing a cross-modal channel attention generation network, mapping the global neural feature vector to a space matched with the video feature channel dimension, and generating a channel attention weight vector through a Sigmoid activation function; the channel-level dynamic calibration module is used for injecting the channel attention weight vector into the self-attention module for extracting video features, and carrying out channel-level dynamic calibration on video feature codes of the tested child visual angles based on the channel attention weight vector so as to enhance behavior features related to nerve anomalies and inhibit random action noise; The time-lag sensing double-flow feature alignment module is used for calculating a difference characterization vector based on the calibrated tested visual angle video features and the main test visual angle video features by adopting the time-lag sensing double-flow feature alignment module and outputting a prediction result of the autism spectrum disorder through the classifier.
9. The system of claim 8, wherein the cross-modality channel attention generation module is further for: adopting a multi-layer perceptron to construct a mapping network from the global nerve characteristic vector to the video characteristic encoder, and projecting the global nerve characteristic vector to a hidden layer channel for video characteristic encoding The same dimension; And inputting the projection result into a Sigmoid activation function to generate a channel attention weight vector.
10. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of claims 1-7.

Description

Double-visual-angle autism detection method based on asynchronous brain function priori Technical Field The invention relates to the technical field of computer-aided diagnosis, in particular to a double-visual-angle autism detection method, a system and a computer-readable storage medium based on asynchronous brain function priori. Background Autism spectrum disorder (Autism Spectrum Disorder, ASD) is a common neurological disorder whose major features include social communication difficulties, narrowing of interest, and notch behavior, etc., which may negatively impact social learning and development, and thus reliable, scalable ASD assessment is of great importance for early intervention and therapy planning. Traditional ASD diagnostics rely primarily on clinical scales and expert behavioral observations and evaluations, which are time consuming and subjective. Video-based ASD detection methods have evolved in recent years due to the advantages of non-invasive and large-scale screening. Based on the multi-modal data comprising video, the multi-modal fusion framework proposed by part of the research demonstrates the feasibility of performing auxiliary diagnosis by using the non-synchronously acquired multi-modal data. Most existing ASD detection methods for video analysis generally only use video data with a single view angle, and focus on analyzing some kind of behavior characteristics of a child individual, such as facial expression changes or hand gestures. The single view approach has difficulty capturing double dynamic features that mimic interactions because the child's behavior must be combined with the experimenter's demonstration actions to be properly interpreted. In addition, the existing multi-mode fusion framework generally extracts features respectively aiming at modes such as video and neurophysiologic data, and then uses the spliced fusion features to finish classification after simple feature splicing is carried out in a potential space. This approach lacks modeling analysis of neurophysiologic mechanisms that result in behavioral behavior in the video data from brain neurological states. Disclosure of Invention The present invention aims to solve at least one of the technical problems in the related art to some extent. To this end, a first object of the present invention is to propose a method for detecting bipolar visual autism based on asynchronous brain function priors, comprising: s1, acquiring an original fNIRS blood oxygen data sequence and double-view video characteristic data of a tested test during an experimental task as input information of a global nerve encoder; s2, constructing a global neural encoder by adopting a depth one-dimensional convolution architecture similar to VGG, extracting space-time characteristics of brain functions through a convolution pooling structure of a layer-by-layer laminated time dimension, and generating a global neural characteristic vector through global average pooling; S3, constructing a cross-modal channel attention generation network, mapping the global neural feature vector to a space matched with the video feature channel dimension, and generating a channel attention weight vector through a Sigmoid activation function; S4, injecting the channel attention weight vector into a self-attention module for extracting video features, and carrying out channel-level dynamic calibration on video feature codes of the tested child visual angles based on the channel attention weight vector so as to enhance behavior features related to nerve anomalies and inhibit random action noise; S5, calculating a difference characterization vector based on the calibrated tested visual angle video features and the main test visual angle video features by adopting a time-lag sensing double-flow feature alignment module, and outputting a prediction result of the autism spectrum disorder through a classifier. In one embodiment of the invention, the global neural encoder comprises: the convolution block group consists of at least two convolution blocks, wherein each convolution block comprises a one-dimensional convolution layer, a batch normalization layer and an ELU activation function, and adjacent convolution blocks are connected through a maximum pooling layer. In one embodiment of the present invention, the step S3 further includes: S31, a multi-layer perceptron is adopted to construct a mapping network from the global nerve characteristic vector to the video characteristic encoder, and the global nerve characteristic vector is projected to a hidden layer channel for video characteristic encoding The same dimension is adopted to obtain a projection result; s32, inputting the projection result into a Sigmoid activation function to generate a channel attention weight vector. In one embodiment of the present invention, the S4 further includes: s41, respectively carrying out video feature coding on a double-view feature sequence by utilizing an independent linear emb