CN-116189287-B - Method, system, equipment and storage medium for detecting small copy action

CN116189287BCN 116189287 BCN116189287 BCN 116189287BCN-116189287-B

Abstract

The invention provides a method, a system, equipment and a storage medium for detecting whether a small reading action exists in a video character, wherein the method comprises the steps of detecting whether a face of a user exists in a current video stream picture; the method comprises the steps of obtaining face key point information when a face exists in a current video stream picture, determining whether a user is speaking according to the obtained face key point information, detecting the sight angle of the user in the current video stream picture, determining whether a small reading and gazing action exists in the user according to the detected sight angle of the user, and determining that the small reading and gazing action exists in the user when the user is speaking and the small reading and gazing action exists in the user. The method integrates the key points of the human face and the sight tracking of the user, identifies whether the user has a small reading action in the video in real time, provides an important reference for the subsequent video auditing, and solves the problem of misjudgment caused by the sight or head posture drift of the user.

Inventors

JIANG YIJIE
YE JUNKE
Zhong Longshen
ZHAO QINGLI

Assignees

广发银行股份有限公司信用卡中心

Dates

Publication Date: 20260505
Application Date: 20221227

Claims (8)

1. A method of detecting the presence of a microcopy action by a video character, comprising: detecting whether a face of a user exists in a current video stream picture; when a face exists in a current video stream picture, acquiring face key point information; determining whether the user is speaking according to the acquired face key point information; Detecting the sight angle of a user in a current video stream picture; Determining whether a small gaze action exists for the user according to the detected sight angle of the user; when the user is determined to be speaking and has small shoveling and gazing actions, determining that the user has small shoveling actions; The face key point information at least comprises face mouth key point information; determining whether the user is speaking according to the acquired face key point information specifically comprises: determining the width and height of the mouth of the user according to the acquired key point information of the mouth of the face; determining the mouth opening ratio of the user according to the mouth width and the mouth height of the user, and determining whether the user is speaking or not according to the determined mouth opening ratio of the user; determining a mouth opening ratio of a user according to the mouth width and the mouth height of the user, and determining whether the user is speaking according to the determined mouth opening ratio of the user, wherein the method specifically comprises the following steps: Determining a plurality of frames of video stream pictures covered by a preset sliding time window, wherein the frames of video stream pictures consist of a current video stream picture and a plurality of frames of video stream pictures in front of and behind the current video stream picture; determining the mouth opening ratio of the user in each frame of picture according to the mouth width and the mouth height of the user in each frame of picture covered by the sliding time window; determining the standard deviation of the mouth opening ratio of the user in a plurality of frames covered by the sliding time window according to the mouth opening ratio of the user in each frame; Judging whether the standard deviation of the mouth opening ratio of the user is larger than a preset threshold value, if so, determining that the user is speaking.
2. The method for detecting the presence of a microcopy action of a video character as in claim 1, Determining the mouth opening ratio of the user in each frame of picture according to the mouth width and the mouth height of the user in each frame of picture covered by the sliding time window, wherein the method specifically comprises the following steps: According to the formula Determining the mouth width and the mouth height of a user in each frame of picture, and determining the mouth opening ratio of the user in each frame of picture, wherein i is the mouth opening ratio of the user, w is the mouth width of the user, and h is the mouth height of the user; According to the mouth opening ratio of the user in each frame of picture, determining the standard deviation of the mouth opening ratio of the user in a plurality of frames of pictures covered by the sliding time window specifically comprises the following steps: According to the formula Determining standard deviation of the mouth opening ratio of the user in a plurality of frames covered by the sliding time window tw, wherein s is the standard deviation of the mouth opening ratio, w is the width of the mouth of the user, h is the height of the mouth of the user, The average value of the mouth opening ratio of the user in N frames covered by the sliding time window tw is given, t is the first frame covered by the sliding time window tw, and t+N is the last frame covered by the sliding time window tw; the mouth opening ratio of the user in the N frames covered by the sliding time window tw.
3. The method for detecting whether a small copy action exists in a video person according to claim 1 or 2, wherein detecting a line of sight angle of a user in a video stream picture and determining whether a small copy fixation action exists in the user according to the detected line of sight angle of the user specifically comprises: Determining a horizontal offset angle of a sight line of a user in a current video stream picture and a vertical offset angle of the sight line of the user; judging whether the sight line horizontal offset angle of the user is smaller than a preset horizontal sight line minimum value or larger than a preset horizontal sight line maximum value; judging whether the vertical offset angle of the sight line of the user is smaller than a preset minimum value of the vertical sight line or larger than a preset maximum value of the vertical sight line; if the current video stream picture meets the first judging condition or the second judging condition, judging whether a plurality of frames of pictures behind the current video stream picture meet the first judging condition or the second judging condition, if so, determining that the user has small-scale watching action, and if not, determining that the user does not have small-scale watching action.
4. The method for detecting the presence of a microcopy action of a video character as in claim 3, wherein determining the user's horizontal offset angle of line of sight and its vertical offset angle of line of sight in the current video stream picture comprises: And respectively intercepting a face picture and a face eye picture in the current video stream picture, inputting the face picture and the face eye picture into a sight tracking model, so that the sight tracking model respectively extracts feature vectors of the face picture and the face eye picture, codes the extracted feature vectors, compresses the coded feature vectors, and obtains a sight horizontal offset angle and a sight vertical offset angle of a user in the current video stream picture.
5. The method for detecting the presence of a microcopy as in claim 4 wherein, The sight tracking model comprises a trunk feature extraction network, a secondary feature extraction network, a feature coding module and a plurality of full-connection layers which are sequentially connected; the trunk feature extraction network consists of a focus module and a CSP network and is used for extracting feature vectors of pictures; The secondary feature extraction network consists of an FPN network, a PAN network and a full connection layer and is used for further processing the feature vectors extracted by the main feature extraction network; the feature coding module is Transformer encoder module which is used for coding the feature vector processed by the secondary feature extraction network; The plurality of full-connection layers are used for compressing the output of the feature encoding module and finally outputting two-dimensional features for representing the horizontal offset angle and the vertical offset angle of the sight.
6. A system for detecting the presence of a microcopy action of a video character, for implementing the method for detecting the presence of a microcopy action of a video character of claim 1, comprising: the system comprises a face detection module, a face key point information acquisition module, a face detection module and a face detection module, wherein the face detection module is used for detecting whether a face of a user exists in a current video stream picture; the user speaking detection module is used for determining whether the user is speaking according to the acquired face key point information; the sight angle detection module is used for detecting the sight angle of the user in the current video stream picture, and determining whether the user has small shoveling and gazing actions according to the detected sight angle of the user; and the small reading action detection module is used for determining that the user has small reading actions when the user is speaking and the small reading attention actions exist.
7. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor, when executing the computer program, implements the method for detecting whether a microcopy action exists in a video person as claimed in any one of claims 1 to 5.
8. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method of detecting whether a microcopy action exists in a video character as in any of claims 1-5.

Description

Method, system, equipment and storage medium for detecting small copy action Technical Field The present invention relates to the field of video detection, and more particularly, to a method, system, apparatus, and storage medium for detecting a handshaking operation. Background The credit card activation auditing link is generally carried out in a video auditing mode, so that a bank auditing system needs to process a large amount of videos, and a person can have a reading action in the video auditing process. The current video auditing link mainly relies on the personnel of the customer to perform naked eye identification through experience, but the identification skill level of the personnel of the customer is uneven, and the personnel is easy to misjudge or miss in actual work. Secondly, the video auditing scene needs to be audited in real time, the identification of the small copying action of the video cannot be carried out after the fact, the auditing difficulty is improved by the real-time auditing, and the modeling is directly carried out by using the picture with the small copying action based on the requirement on the accuracy of the model, so that the requirement on a data sample is high. Disclosure of Invention The invention aims to overcome at least one defect of the prior art, and provides a method, a system, equipment and a storage medium for detecting a small reading action, which are used for solving the problems that the auditing difficulty and the cost are increased due to the fact that whether the small reading action is made by a video person cannot be automatically identified in the existing video real-time auditing. The technical scheme adopted by the invention comprises the following steps: The invention provides a method for detecting whether a video character has a small reading action or not, which comprises the steps of detecting whether a face of a user exists in a current video stream picture, acquiring face key point information when the face exists in the current video stream picture, determining whether the user is speaking according to the acquired face key point information, detecting the sight angle of the user in the current video stream picture, determining whether the user has the small reading action or not according to the detected sight angle of the user, and determining that the user has the small reading action when the user is speaking and the small reading action exists. The method provided by the invention detects the human face in the video stream picture in real time and acquires the key point information of the human face, and analyzes whether the user is speaking according to the key point information of the human face, wherein the user is speaking to indicate that the user is answering the question of video auditing. Meanwhile, the sight angle of the user in the video stream picture is detected, and the sight area of the user is determined through the sight angle of the user, so that whether the user does not watch at the camera area but watches at the small reading area is judged, if yes, the existence of the small reading and watching action of the user can be determined, and if the user is judged to answer the video auditing problem and the small reading and watching action exists, the existence of the small reading action of the user can be finally determined, the result can provide an important reference for real-time video auditing, and the application of the user can be approved in real time based on the detection result of the existence of the small reading action, so that real-time and accurate video auditing is achieved. The method integrates the key points of the human face and the sight tracking of the user, and solves the problem of misjudgment caused by the drift of the sight or the head gesture of the user. The face key point information at least comprises face mouth key point information, whether the user is speaking or not is determined according to the obtained face mouth key point information, specifically, the face key point information comprises the steps of determining the mouth width and the mouth height of the user according to the obtained face mouth key point information, determining the mouth opening ratio of the user according to the mouth width and the mouth height of the user, and determining whether the user is speaking or not according to the determined mouth opening ratio of the user. According to the mouth opening ratio determined by the mouth width and the mouth height of the user, the influence of the head posture drift is avoided, whether the user is speaking can be accurately judged, and accordingly whether the user is answering the video auditing problem is determined. Further, determining the mouth opening ratio of the user according to the mouth width and the mouth height of the user, determining whether the user is speaking according to the determined mouth opening ratio of the user, specifically comprising determi