EP-4742010-A1 - HUMAN-FACTORS INTELLIGENCE USER GAZE ANALYSIS METHOD, APPARATUS AND SYSTEM, AND EDGE COMPUTING DEVICE

EP4742010A1EP 4742010 A1EP4742010 A1EP 4742010A1EP-4742010-A1

Abstract

Provided are a human-factors intelligence user gaze analysis method, apparatus, and system, and an edge computing device, which belong to the technical field of computer vision. The method includes: collecting visual data of a user's field of view through a camera of a head-mounted device, and collecting eye movement data of the user within the field of view through an eye tracker of the head-mounted device; identifying a target object in the visual data; determining target eye movement data associated with gaze on the target object in the eye movement data; and sending the target eye movement data to a target screen to correspondingly present the target eye movement data on the target object displayed on the target screen. Embodiments of the present disclosure improve efficiency of user visual behavior analysis.

Inventors

ZHAO, QICHAO
YANG, Ran
WANG, QINGJU

Assignees

Kingfar International Inc.

Dates

Publication Date: 20260513
Application Date: 20250916

Claims (15)

A human-factors intelligence user gaze analysis method, comprising: collecting visual data of a user's field of view through a camera of a head-mounted device, and collecting eye movement data of the user within the field of view through an eye tracker of the head-mounted device; identifying a target object in the visual data; determining target eye movement data associated with gaze on the target object in the eye movement data; and sending the target eye movement data to a target screen to correspondingly present the target eye movement data on the target object displayed on the target screen.
The method according to claim 1, further comprising: determining a two-dimensional marker to be used based on a scene; and deploying the two-dimensional marker at a key point of the target object, and recording a corresponding relationship among the target object, the key point, and the two-dimensional marker; wherein said identifying the target object in the visual data comprises: locating a position of the target object in the visual data based on a position of the two-dimensional marker in the visual data and the corresponding relationship; wherein said determining the two-dimensional marker to be used based on the scene comprises: determining a size of a blank area in the two-dimensional marker based on the scene; the blank area being an area between an edge of the two-dimensional marker and a background area of the two-dimensional marker; and determining the two-dimensional marker to be used based on the size of the blank area; and/or wherein said determining the two-dimensional marker to be used based on the scene comprises: determining an area ratio of a background area in the two-dimensional marker to the two-dimensional marker based on the scene; and determining the two-dimensional marker to be used based on the area ratio.
The method according to claim 2, wherein the target object comprises a plurality of key points, and said deploying the two-dimensional marker at the key point of the target object comprises: selecting a plurality of non-collinear key points from the target object; deploying the two-dimensional marker at each of the plurality of non-collinear key points, and recording a corresponding relationship among the target object, each of the plurality of non-collinear key points, and the two-dimensional marker; or selecting a plurality of key points from the target object to form a polygon; deploying the two-dimensional marker at each of the plurality of key points that form the polygon, and recording a corresponding relationship among the target object, each of the plurality of key points that form the polygon, and the two-dimensional marker.
The method according to claim 2, wherein said locating the target object based on the position of the two-dimensional marker in the visual data and the corresponding relationship comprises: performing feature extraction on the visual data; detecting the two-dimensional marker comprised in the visual data and the position of the two-dimensional marker in the visual data based on an extracted feature; locating the target object in the visual data based on the corresponding relationship and a detection result of the two-dimensional marker and the position of the two-dimensional marker in the visual data; or obtaining, in response to detecting that the number of two-dimensional markers in the visual data is less than the number of deployed two-dimensional markers, a position of an undetected two-dimensional marker by fitting based on a geometric positional relationship of the key point; and locating the target object based on a position of a detected two-dimensional marker in the visual data, the obtained position of the undetected two-dimensional marker, and the corresponding relationship.
The method according to claim 1, wherein said sending the target eye movement data to the target screen to correspondingly present the target eye movement data on the target object displayed on the target screen comprises: obtaining position information of the target eye movement data on the target object; performing identification on the visual data to obtain position information of each marker pre-marked on the target object; determining a transformation relationship based on the position information of each marker; and substituting the position information of the target eye movement data on the target object into the transformation relationship to obtain target coordinates of the target eye movement data on the target object in the target screen.
The method according to claim 5, wherein said determining the transformation relationship based on the position information of each marker comprises: determining a first parameter set for calculating an abscissa of the target coordinates, a second parameter set for calculating an ordinate of the target coordinates, and a third parameter set for calculating a homogeneous coordinate normalization factor based on the position information of each marker; determining an expression of a first intermediate variable based on the third parameter set; obtaining a first expression for calculating the abscissa of the target coordinates based on the expression of the first intermediate variable and the first parameter set; and obtaining a second expression for calculating the ordinate of the target coordinates based on the expression of the first intermediate variable and the second parameter set, the first expression and the second expression constituting the transformation relationship.
The method according to claim 5, wherein: said performing identification on the visual data to obtain the position information of each marker pre-marked on the target object comprises: identifying a marker area of each marker pre-marked on the target object from the visual data; and determining center point coordinates of each marker area in a coordinate system established based on the visual data to obtain the position information of each marker; wherein said determining the center point coordinates of each marker area comprises: for each marker area in the visual data, determining coordinates of each point in a contour of the marker area, and calculating average coordinates of all points in the contour of the marker area to obtain the center point coordinates of the marker area.
The method according to claim 1, wherein said sending the target eye movement data to the target screen to correspondingly present the target eye movement data on the target object displayed on the target screen comprises: obtaining a first image block comprising the target eye movement data in a first video frame of the visual data; calculating, in response to the first image block satisfying a predetermined condition, a coordinate mapping relationship between the first video frame and a second video frame in the target screen based on the first image block; and mapping the target eye movement data to the second video frame based on the coordinate mapping relationship.
The method according to claim 8, wherein said obtaining the first image block comprising the target eye movement data in the first video frame of the visual data comprises: determining a position of the target eye movement data in the first video frame; and determining an area within a first predetermined range centered on the position of the target eye movement data in the first video frame as the first image block; wherein: the predetermined condition comprises: the number of matching points in an image block being greater than or equal to a predetermined number, and/or a plurality of matching points in the image block being not collinear; the method further comprises: obtaining, in response to the first image block not satisfying the predetermined condition, a second image block comprising the target eye movement data in the first video frame; calculating, in response to the second image block satisfying the predetermined condition, the coordinate mapping relationship between the first video frame and the second video frame in the target screen based on the second image block.
The method according to claim 8, wherein said obtaining the second image block comprising the target eye movement data in the first video frame comprises: determining a position of the target eye movement data in the first video frame; determining the area within a second predetermined range centered on the position of the target eye movement data in the first video frame as the second image block; the second predetermined range being larger than a first predetermined range.
The method according to claim 8, further comprising, prior to obtaining the first image block comprising the target eye movement data in the first video frame of the visual data: performing feature point matching on the first video frame and the second video frame in the target screen, and reselecting, in response to a matching failure between the first video frame and the second video frame, a video frame from the visual data to replace the first video frame and performing the feature point matching on the reselected video frame and the second video frame; wherein said reselecting the video frame from the visual data to replace the first video frame and performing the feature point matching on the reselected video frame and the second video frame comprises: determining a video frame adjacent to the first video frame in the visual data or another video frame in the visual data as the reselected video frame; or wherein: said performing the feature point matching on the first video frame and the second video frame in the target screen comprises: inputting the first video frame and the second video frame into a predetermined neural network model, such that the neural network model extracts feature points of the first video frame and the second video frame and performs matching based on extracted feature points; or said performing the feature point matching on the first video frame and the second video frame in the target screen comprises: preprocessing the first video frame and the second video frame, wherein the preprocessing comprises removing feature points within a predetermined range of a boundary of the first video frame and removing feature points within a predetermined range of a boundary of the second video frame; and performing feature point matching on a pre-processed first video frame and a pre-processed second video frame; and/or inputting the pre-processed first video frame and the pre-processed second video frame into a predetermined neural network model, such that the neural network model extracts feature points of the preprocessed first video frame and the preprocessed second video frame and performs matching based on extracted feature points.
The method according to claim 11, further comprising: storing the feature point of the second video frame extracted by the neural network model; wherein said reselecting the video frame from the visual data to replace the first video frame and performing feature point matching on the reselected video frame and the second video frame comprises: inputting a first video frame after replacing into a predetermined neural network model, such that the neural network model extracts a feature point of the first video frame and performs matching based on the feature point of the first video frame and the stored feature point of the second video frame.
The method according to claim 8, wherein said calculating the coordinate mapping relationship between the first video frame and the second video frame in the target screen based on the first image block comprises: obtaining first position coordinates of a matching point in the first image block in the first video frame and second position coordinates of the matching point in the second video frame; and calculating a homography matrix corresponding to the matching point based on the first position coordinates and the second position coordinates, the homography matrix representing the coordinate mapping relationship.
The method according to claim 1, further comprising: obtaining target eye movement data of a plurality of users associated with gazes on a same target object; and analyzing a multi-user gaze behavior based on the target eye movement data corresponding to the plurality of users; wherein said analyzing the multi-user gaze behavior based on the target eye movement data corresponding to the plurality of users comprises: sending the target eye movement data corresponding to the plurality of users to the target screen, to superimpose and present the target eye movement data corresponding to the plurality of users at the target object displayed on the target screen; and/or wherein said analyzing the multi-user gaze behavior based on the target eye movement data corresponding to the plurality of users comprises: analyzing an eye movement trajectory and/or an eye movement point heat map of the plurality of users based on the target eye movement data corresponding to the plurality of users, to obtain a primary viewing position of the plurality of users and a habitual operation process of the plurality of users when performing a predetermined operation; and optimizing a scene based on the primary viewing position of the plurality of users and the habitual operation process of the plurality of users when performing the predetermined operation.
An edge computing device, comprising: a processor; and a memory having a program or instructions executable on the processor stored thereon, wherein the program or instructions, when executed by the processor, implements the human-factors intelligence user gaze analysis method according to any one of claims 1 to 14.

Description

FIELD The present application belongs to the technical field of computer vision, and particularly relates to a human-factors intelligence user gaze analysis method, a human-factors intelligence user gaze analysis apparatus, a human-factors intelligence user gaze analysis system, and an edge computing device. BACKGROUND Eye tracking technology is widely applied in psychology, healthcare, advertising analysis, autonomous driving, and other fields. By tracking users' eye movements, it helps understand their attention distribution, decision-making behaviors, reaction speeds, etc. Conventional eye tracking systems capture movement trajectories of users' eyes through cameras mounted on displays or devices to analyze their visual focuses. However, this technology mostly focuses on data analysis in the later stage. The corresponding annotation of the mapping of eye movement information on the target screen are carried out through manual annotation by annotators, resulting in low efficiency in analyzing users' visual behaviors. SUMMARY The present disclosure aims to solve at least one of the technical problems in the related art. To this end, the present disclosure provides a human-factors intelligence user gaze analysis method, a human-factors intelligence user gaze analysis apparatus, a human-factors intelligence user gaze analysis system, and an edge computing device, to improve efficiency of user visual behavior analysis. In a first aspect, the present disclosure provides a human-factors intelligence user gaze analysis method. The method includes: collecting visual data of a user's field of view through a camera of a head-mounted device, and collecting eye movement data of the user within the field of view through an eye tracker of the head-mounted device; identifying a target object in the visual data; determining target eye movement data associated with gaze on the target object in the eye movement data; and sending the target eye movement data to a target screen to correspondingly present the target eye movement data on the target object displayed on the target screen. With the human-factors intelligence user gaze analysis method according to the present disclosure, the visual data of the user's field of view is collected through the camera of the head-mounted device, and the eye movement data of the user within the field of view is collected through the eye tracker of the head-mounted device; the target object in the visual data is identified; the target eye movement data associated with gaze on the target object in the eye movement data is determined; and the target eye movement data is sent to the target screen to correspondingly present the target eye movement data on the target object displayed on the target screen. The embodiment of the present disclosure realizes effective identification of the target object gazed at by the user by collecting the visual data and the eye movement data, and displays the target eye movement data of the user on the target object on the target screen. The present disclosure can display the target eye movement data of the user on the target object on the target screen in real time without manually annotating target gaze data of the user on the target object, to perform real-time analysis of user's visual behavior on the target screen, enhancing the user interaction experience and improving efficiency of user visual behavior analysis. According to an embodiment of the present disclosure, the method further includes: determining a two-dimensional marker to be used based on a scene; and deploying the two-dimensional marker at a key point of the target object, and recording a corresponding relationship among the target object, the key point, and the two-dimensional marker. Said identifying the target object in the visual data includes: locating a position of the target object in the visual data based on a position of the two-dimensional marker in the visual data and the corresponding relationship. In this embodiment, by determining an appropriate two-dimensional marker based on needs of the scene, selection of the two-dimensional marker can change based on the scene, in such a manner that the two-dimensional marker has sufficient degree of identification and information capacity in the visual data under different scenes. In addition, by deploying the two-dimensional marker at the key point of the target object and recording the corresponding relationship, necessary reference information is provided for subsequent locating. In this way, a position of the target object can be accurately calculated based on the position of the two-dimensional marker in the visual data, improving accuracy of locating. According to an embodiment of the present disclosure, said determining the two-dimensional marker to be used based on the scene includes: determining a size of a blank area in the two-dimensional marker based on the scene; the blank area being an area between an edge of the two-dimensi