EP-3751461-B1 - IMAGE RECOGNITION METHOD, AND IMAGE PRESENTATION TIME ADJUSTMENT METHOD AND DEVICE

EP3751461B1EP 3751461 B1EP3751461 B1EP 3751461B1EP-3751461-B1

Inventors

YANG, HUI
YUAN, PENG
TANG, WEIDONG
PENG, Shuaihua

Dates

Publication Date: 20260506
Application Date: 20190301

Claims (12)

A brain-computer combination image recognition method based on image sequence presentation, comprising: setting a presentation time sequence corresponding to an image sequence, wherein the image sequence comprises N images, N is a positive integer, the presentation time sequence comprises a presentation time of each image in the image sequence, a presentation time of an image i is used to indicate a time period from a presentation start moment of the image i to a presentation start moment of a next adjacent image, the image i is any image in the image sequence, the presentation time sequence comprises at least two unequal presentation times, a difference between any two presentation times of the at least two unequal presentation times is k x Δ, k is a positive integer, and Δ is a preset time period value; processing the image sequence by using a computer vision algorithm, to obtain a computer vision signal corresponding to each image in the image sequence; obtaining a feedback signal, collected by a feedback signal collection device, that is generated when an observation object watches the image sequence displayed in the presentation time sequence and that corresponds to each image in the image sequence, wherein the feedback signal is used to indicate a reaction of the observation object to the watched image, the feedback signal being a biological signal of the observation object collected by the feedback signal collection device and is any one or more of: an electroencephalogram signal, and an eye movement signal; and fusing, for each image in the image sequence, a corresponding computer vision signal and a corresponding feedback signal to obtain a target recognition signal of each image in the image sequence, wherein the target recognition signal is used for image recognition; wherein the computer vision signal is a probability that the target image includes a preset image feature or an image feature of the target image, and wherein after the feedback signal of the observation object is obtained, the probability that the target image includes the preset image feature is determined based on the feedback signal, or a feedback signal feature of the feedback signal is extracted; wherein the setting a presentation time sequence corresponding to an image sequence comprises: determining a corresponding presentation time for each image in the image sequence based on a duration impact parameter, to obtain the presentation time sequence corresponding to the image sequence; wherein the duration impact parameter comprises a first recognition probability and a fatigue state parameter, the first recognition probability is used to indicate the probability, obtained through calculation by using the computer vision algorithm, that the image comprises the preset image feature, the fatigue state parameter is used to indicate a fatigue degree of the observation object when the observation object observes an image, the presentation time is inversely correlated with the first recognition probability, and the presentation time is positively correlated with the fatigue state parameter; and wherein the determining a corresponding presentation time for each image in the image sequence based on a duration impact parameter comprises: for each image in the image sequence, finding a presentation time corresponding to the first recognition probability and the fatigue parameter comprised in the duration impact parameter, from a mapping table, wherein the mapping table comprises a plurality of first recognition probability values, a plurality of fatigue parameter values, and presentation times respectively corresponding to the recognition probability values and the plurality of fatigue parameter values; or for each image in the image sequence, obtaining the presentation time by using a fitting formula, the fitting formula comprising a plurality of first recognition probability values and a plurality of fatigue parameter values.
The method according to claim 1, wherein before the setting a presentation time sequence corresponding to an image sequence, the method further comprises: receiving M images from a camera device, wherein M is an integer greater than 1; and selecting N images from the M images as the image sequence, wherein N is less than or equal to M.
The method according to claim 1 or 2, wherein the obtaining a feedback signal that is generated when an observation object watches the image sequence displayed in the presentation time sequence and that corresponds to each image in the image sequence comprises: in a process of displaying the image sequence in the presentation time sequence, obtaining the fatigue state parameter corresponding to an image j, and adjusting, based on the fatigue state parameter corresponding to the image j, a presentation time, in the presentation time sequence, corresponding to an image to be displayed after the image j in the image sequence, wherein the image j is any image in the image sequence.
The method according to claim 3, wherein the obtaining the fatigue state parameter corresponding to the image j comprises: obtaining the fatigue state parameter based on fatigue state information that is sent by a sensor and that is obtained when the observation object watches the image j, wherein the sensor is at least one of an electroencephalogram collection device used as a sensor for measuring fatigue status information, or a sensor for detecting an eye movement.
The method according to any one of claims 1 to 4, further comprising: when it is detected that a corresponding fatigue state parameter obtained when the observation object observes an image q is greater than or equal to a first fatigue threshold, controlling to stop displaying images to be displayed after the image q in the image sequence, and obtaining an image whose corresponding first recognition probability is greater than or equal to a first probability threshold in the images to be displayed after the image q, wherein the image q is any image in the image sequence; and when it is detected that the fatigue state parameter of the observation object is less than or equal to a second fatigue threshold, controlling to sequentially display the image whose first recognition probability is greater than or equal to the first probability threshold in the images to be displayed after the image q.
The method according to any one of claims 1 to 4, wherein there are at least two observation objects, and the fusing, for each image in the image sequence, a corresponding computer vision signal and a corresponding feedback signal to obtain a target recognition signal of each image in the image sequence comprises: fusing, for each image in the image sequence, a corresponding computer vision signal and at least two corresponding feedback signals to obtain a target recognition signal of each image in the image sequence.
The method according to claim 6, wherein the fatigue state parameter comprises at least two fatigue state parameters respectively generated when the at least two observation objects observe a same image.
The method according to any one of claims 1 to 4, wherein the fusing, for each image in the image sequence, a corresponding computer vision signal and a corresponding feedback signal to obtain a target recognition signal of each image in the image sequence comprises: determining, for each image in the image sequence based on at least one of the first recognition probability, the fatigue state parameter, and the presentation time, a first weight corresponding to each image in the image sequence, wherein the first weight is a weight used when the corresponding feedback signal is used to determine the target recognition signal, the first weight is inversely correlated with the first recognition probability, the first weight is inversely correlated with the fatigue state parameter, and the first weight is positively correlated with the presentation time; and fusing, for each image in the image sequence based on a corresponding first weight, a corresponding computer vision signal and a corresponding feedback signal to obtain the target recognition signal of each image in the image sequence, increasing or decreasing a first weight used by recognition by using the computer vision algorithm and a first weight used by brain recognition of an observation object is decreased or increased.
The method according to any one of claims 1 to 4, wherein when the computer vision signal is a first recognition probability determined by using the computer vision algorithm; before the fusing, for each image in the image sequence, a corresponding computer vision signal and a corresponding feedback signal to obtain a target recognition signal of each image in the image sequence, the method further comprises: calculating, for each image in the image sequence, a second recognition probability of each image in the image sequence based on a corresponding feedback signal, wherein the second recognition probability is used to indicate a probability that the observation object determines that the image comprises the preset image feature; and the fusing, for each image in the image sequence, a corresponding computer vision signal and a corresponding feedback signal to obtain a target recognition signal of each image in the image sequence comprises: calculating, for each image in the image sequence, a target recognition probability of each image in the image sequence based on the corresponding first recognition probability and the corresponding second recognition probability.
The method according to any one of claims 1 to 4, wherein when the computer vision signal is an image feature determined by using the computer vision algorithm; before the fusing, for each image in the image sequence, a corresponding computer vision signal and a corresponding feedback signal to obtain a target recognition signal of each image in the image sequence, the method further comprises: determining, for each image in the image sequence based on a corresponding feedback signal, a feedback signal feature corresponding to each image in the image sequence; and the fusing, for each image in the image sequence, a corresponding computer vision signal and a corresponding feedback signal to obtain a target recognition signal of each image in the image sequence comprises: performing, for each image in the image sequence, feature fusion on the corresponding image feature and the corresponding feedback signal feature, to obtain a fused feature corresponding to each image in the image sequence; and determining, for each image in the image sequence, a target recognition probability of each image in the image sequence based on the corresponding fused feature.
An image recognition device comprising: means for carrying out steps of the method of any one of claims 1 to 10.
A computer readable storage medium, wherein the storage medium is configured to store an instruction, and when the instruction is run, a computer is enabled to execute the method according to any one of claims 1 to 10.

Description

TECHNICAL FIELD Embodiments of the present invention relate to the field of information technologies, and in particular, to an image recognition method and device, and an image presentation time adjustment method and device. BACKGROUND In the current information age, people share abundant information resources, but also often encounter the problem of "information overload" or "information explosion". How to efficiently select the most needed information from the massive information resources is an important topic in the coming information era. In the image field, image recognition is one of the most concerned problems. Image recognition may be implemented by using a computer vision (computer vision) algorithm. The computer vision algorithm may be a conventional image detection algorithm, or may be a deep learning algorithm based on an artificial neural network. The conventional image detection algorithm extracts image features from an image area, and classifies images based on whether an image is a target image according to the image classification algorithm. The deep learning algorithm based on the artificial neural network may train an initial convolutional neural network by using a training sample, adjust a parameter in the initial convolutional neural network to converge an error of image recognition, so as to construct a new convolutional neural network, and predict a probability that an image is a target image by using the new convolutional neural network, so as to perform image recognition. Both the conventional target detection algorithm and the deep learning algorithm based on the artificial neural network have the following disadvantages: First, it may be difficult to obtain training data of a specific type, which causes unbalanced distribution of training samples. Second, noise of training data is large, which causes a large error of the algorithm. In addition, some features of the image, for example, a high-order semantic feature, is difficult to extract. Compared with the computer vision algorithm, human brain has abundant cognition and apriori knowledge. Extracting a feature by human brain can be independent of the problems such as an amount of training data and the unbalanced sample distribution. In addition, the human brain often exhibits strong stability even under the impact of noise. In addition, the human brain's experience, and high-level semantic understanding and inference ability can also be used to find some obscure high-level features. However, the human brain has some disadvantages in target image recognition, for example, relatively low efficiency. Therefore, persons skilled in the art can think of combining the advantages of the human brain and the computer, and performing image recognition through brain computer coordination, that is, collaboration between the human brain and the computer vision algorithm. When the brain collaborates with the computer on target image recognition, an image sequence based on a rapid serial visual presentation (rapid serial visual presentation, RSVP) paradigm may be used as an external stimulus of the human brain. When a person observes the image sequence, electroencephalogram (electroencephalogram, EEG) signals of the human brain that are obtained when the person observes the target image and a common image have different features. Electroencephalogram signals obtained when the human brain observes an image sequence can be collected and analyzed, and an image feature of an image in the image sequence can be collected by using the computer vision algorithm. For each image in the image sequence, whether the image is a target image may be recognized based on an electroencephalogram signal and an image feature. Currently, a time interval between images in an image sequence based on RSVP is determined according to experience or an experiment. However, because a human brain is prone to fatigue and attention resources of the human brain are limited, a miss detection rate of brain-computer collaboration image recognition is still relatively high, resulting in relatively low efficiency of brain-computer collaboration image recognition. WO 2016/193979 A1 discloses a Brain Computer Interface (BCI), a method and a system for classification of an image. SUMMARY Embodiments of this application disclose a brain-computer combination image recognition method and device based on image sequence presentation, and an image presentation time adjustment method and device, so as to improve efficiency of brain-computer combination image recognition. The invention is defined in the independent claims. Additional features of the invention are provided in the dependent claims. In the following, parts of the description and drawings referring to embodiments that are not covered by the claims are not embodiments of the invention, but are examples useful for understanding the invention. According to a first aspect, an embodiment of this application provides a brain-computer co