CN-117173500-B - Sample generation method, training method, recognition method and device

CN117173500BCN 117173500 BCN117173500 BCN 117173500BCN-117173500-B

Abstract

The application provides a sample generation method, a training method, an identification method and a device, wherein the method comprises the steps of obtaining a first person image with target behaviors, a second person image without target behaviors and a target environment image, extracting gesture information of the first person image and the second person image, inputting the gesture information and random noise which accords with normal distribution into a human body generation network to generate a virtual human body image which accords with the gesture information, inputting the virtual human body image, the target environment image and human body coordinate information into a background fusion network to generate a virtual person work image, generating labeling information for the virtual person work image to obtain the virtual person work image with the labeling information, and generating a sample training set based on the virtual person work image with the labeling information.

Inventors

LIU PENG
PANG KUN
LIU YI
ZHAO HUI
CHEN JIANXUE

Assignees

国能(天津)港务有限责任公司

Dates

Publication Date: 20260512
Application Date: 20230802

Claims (11)

1. A method of sample generation, the method comprising: Acquiring a first person image with target behaviors, a second person image without target behaviors and a target environment image; Extracting gesture information of the first person image and the second person image; Inputting the gesture information and random noise conforming to normal distribution into a human body generation network to generate a virtual human body image conforming to the gesture information; Inputting the virtual human body image, the target environment image and the human body coordinate information into a background fusion network to generate a virtual character working image, wherein the background fusion network comprises a second generator and a second discriminator; Generating annotation information for the virtual character working image to obtain the virtual character working image with the annotation information, wherein the annotation information comprises human body coordinate information and behavior information of the virtual human body image in the virtual character working image, the human body coordinate information comprises a human body coordinate position of the virtual human body image in the target environment image and a width value and a height value of the virtual human body image, and the behavior information is a target behavior of the virtual human body image; Generating a sample training set based on the virtual character working image with the labeling information; The human body generation network comprises a first generator and a first discriminator, the method comprises the steps of obtaining a first sample human body image with target behaviors and a second sample human body image without target behaviors, extracting sample real human body images and sample human body posture information of the first sample human body image, the second sample human body image, inputting the sample human body posture information and sample random noise conforming to normal distribution into the first generator to generate a sample virtual human body image conforming to the sample human body posture information, inputting the sample real human body image or the sample virtual human body image into the first discriminator to output a first probability, updating the weight of the first generator through inverse gradient calculation according to the comparison result of the sample virtual human body image and the corresponding first sample human body image or the second sample human body image, updating the weight of the first discriminator through inverse gradient calculation according to the first probability, and repeating the steps until the human body generation network converges.
2. The method according to claim 1, characterized in that the method comprises: Acquiring a sample target environment image, a sample real character working image, sample human body coordinate information and a sample virtual human body image, wherein the sample virtual human body image is generated by the human body generation network; Inputting the sample target environment image, the sample virtual human body image and the sample human body coordinate information into the second generator to generate a sample virtual human figure working image; Inputting the sample virtual character working image or the sample real character working image to the second discriminator, and outputting a second probability; Updating the second generator through inverse gradient calculation according to the comparison result of the sample virtual character working image and the corresponding sample real character working image, and updating the weight of the second discriminator through inverse gradient calculation according to the second probability; Repeating the steps until the background fusion network converges.
3. The method of claim 1, wherein generating annotation information for the virtual persona work image, prior to obtaining the virtual persona work image with the annotation information, comprises: and preprocessing the virtual character working image, wherein the preprocessing comprises random rotation, translation, shading change or/and noise adding.
4. A training method of a detection model, characterized in that a sample training set generated by applying the sample generation method of any one of claims 1 to 3 is applied, the detection model includes a human detection model and a behavior recognition model, the method includes: Counting the labeling information of all the virtual character working images in the sample training set, and obtaining the width value and the height value of a labeled target frame according to the labeling information; performing K-Means cluster analysis on the width value and the height value of the target frame to obtain width values and height values of a plurality of anchor frames; Inputting the virtual character working images in the sample training set into the human body detection model, and obtaining a prediction frame and confidence coefficient of a human body under different scales based on the width and height values of a plurality of anchor frames; Calculating the predicted frame loss value and the confidence coefficient loss value, and carrying out gradient calculation according to the predicted frame loss value and the confidence coefficient loss value to update the weight of the human body detection model; repeating the training steps until the predicted frame loss value and the confidence coefficient loss value converge or reach the designated training times; Inputting the virtual character working image with the target behavior into the human body detection model, and outputting a human body image; inputting the human body image into the behavior recognition model, and outputting a third probability; Calculating a third loss value of the third probability, and performing gradient calculation and updating of behavior recognition model weight according to the third loss value; Repeating the training steps until the third loss value converges or reaches the designated training times.
5. An identification method, characterized in that a test model obtained by training by using the training method of the test model according to claim 4, the method comprising: Acquiring a monitoring video of a target environment, decoding the monitoring video, and acquiring a single-frame image; performing adaptive picture scaling and pixel normalization preprocessing on the single-frame image; and inputting the preprocessed single-frame image into the detection model, and outputting the probability of existence of the target behavior.
6. The method according to claim 5, wherein after inputting the preprocessed single-frame image into the detection model and outputting the probability that the target behavior exists, the method comprises: Comparing the probability with a preset threshold; And if the probability is larger than the preset threshold value, generating alarm information.
7. A sample generation apparatus, comprising: The first acquisition module is used for acquiring a first person image with target behaviors, a second person image without target behaviors and a target environment image; The extraction module is used for extracting the gesture information of the first person image and the second person image; the first generation module is used for inputting the gesture information and random noise conforming to normal distribution into a human body generation network to generate a virtual human body image conforming to the gesture information; The second generation module is used for inputting the virtual human body image, the target environment image and the human body coordinate information into a background fusion network to generate a virtual character working image, and the background fusion network comprises a second generator and a second discriminator; The labeling module is used for generating labeling information for the virtual character working image to obtain the virtual character working image with the labeling information; the labeling module is used for acquiring human body coordinate information and behavior information of the virtual human body image in the virtual human body working image, wherein the human body coordinate information comprises a human body coordinate position of the virtual human body image in the target environment image, a width value and a height value of the virtual human body image, and the behavior information is a target behavior of the virtual human body image; the third generation module is used for generating a sample training set based on the virtual character working image with the labeling information; The human body generation network comprises a first generator and a first discriminator, the device is further used for acquiring a first sample human body image with target behaviors and a second sample human body image without target behaviors, extracting sample real human body images and sample human body posture information of the first sample human body image, the second sample human body image, inputting the sample human body posture information and sample random noise conforming to normal distribution into the first generator to generate a sample virtual human body image conforming to the sample human body posture information, inputting the sample real human body image or the sample virtual human body image into the first discriminator to output a first probability, updating the weight of the first generator through inverse gradient calculation according to the comparison result of the sample virtual human body image and the corresponding first sample human body image or the second sample human body image, updating the weight of the first discriminator through inverse gradient calculation according to the first probability, and repeating the steps until the human body generation network converges.
8. A training device for a detection model, characterized in that a sample training set generated by applying the sample generation method according to any one of claims 1 to 3 is applied, the detection model includes a human body detection model and a behavior recognition model, and the training device includes: The statistics module is used for counting the labeling information of all the virtual character working images in the sample training set, and obtaining the width and height values of the target frame for labeling according to the labeling information; the cluster analysis module is used for carrying out K-Means cluster analysis on the width and height values of the target frames to obtain width and height values of a plurality of anchor frames; the first input module is used for inputting the virtual character working images in the sample training set into the human body detection model, and obtaining the prediction frames and the confidence coefficients of human bodies under different scales according to the width and the height values of a plurality of anchor frames; the first calculation module is used for calculating the predicted frame loss value and the confidence coefficient loss value, and carrying out gradient calculation according to the predicted frame loss value and the confidence coefficient loss value to update the weight of the human body detection model; the first repeating module is used for repeating the training steps until the predicted frame loss value and the confidence coefficient loss value converge or reach the designated training times; the second input module is used for inputting the virtual character working image with the target behavior into the human body detection model and outputting a human body image; the third input module is used for inputting the human body image into the behavior recognition model and outputting a third probability of existence of a target behavior; the second calculation module is used for calculating a third loss value according to the third probability, and carrying out gradient calculation and updating the behavior recognition model weight according to the third loss value; and the second repeating module is used for repeating the training steps until the third loss value converges or reaches the designated training times.
9. An identification device, characterized in that a test model obtained by training by using the training method of the test model according to claim 4 comprises: The second acquisition module is used for acquiring the monitoring video of the target environment and decoding the monitoring video to acquire a single-frame image; The preprocessing module is used for carrying out adaptive picture scaling and pixel normalization preprocessing on the single-frame image; And the fourth input module is used for inputting the preprocessed single-frame image into the detection model and outputting the probability of existence of the target behavior.
10. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program which, when executed by the processor, performs the method of any of claims 1 to 6.
11. A storage medium storing a computer program executable by one or more processors for implementing the method of any one of claims 1 to 6.

Description

Sample generation method, training method, recognition method and device Technical Field The application relates to the technical field of sample generation, in particular to a sample generation method, a training method, an identification method and a device. Background In the field of operation risk monitoring, the traditional methods are to check whether safety risks exist or not by means of inspection by management staff and checking on-site videos, and although the inspection methods have a certain safety management effect, the problems of low working efficiency and inaccurate data exist. Along with the rapid development of deep learning, an intelligent monitoring method adopting the deep learning is widely applied to the field of operation risk monitoring, but the method based on the deep learning needs a large amount of data to train a detection model to achieve an ideal detection and identification effect, so that the method has great safety and economic significance for operation risk monitoring by conveniently and rapidly constructing a data set. In the related art, differences exist between virtual character features obtained by the virtual character training samples and real character features obtained by the real character training samples, and the training effect of the detection model is affected. Disclosure of Invention The application provides a sample generation method, a training method, an identification method and a device for solving the problems. The application provides a sample generation method, which comprises the following steps: Acquiring a first person image with target behaviors, a second person image without target behaviors and a target environment image; Extracting gesture information of the first person image and the second person image; Inputting the gesture information and random noise conforming to normal distribution into a human body generation network to generate a virtual human body image conforming to the gesture information; Inputting the virtual human body image, the target environment image and the human body coordinate information into a background fusion network to generate a virtual character working image; Generating annotation information for the virtual character working image to obtain the virtual character working image with the annotation information; and generating a sample training set based on the virtual character working image with the labeling information. In some embodiments, the human body generation network includes a first generator and a first determiner, the method comprising: Acquiring a first sample person image with target behaviors and a second sample person image without target behaviors; extracting a sample real human body image and sample human body posture information of the first sample human body image and the second sample human body image; Inputting the sample human body posture information and sample random noise conforming to normal distribution into the first generator, and generating a sample virtual human body image conforming to the sample posture information; Inputting the sample real human body image or the sample virtual human body image into a first discriminator, and outputting a first probability; Updating the weight of the first generator through inverse gradient calculation according to the comparison result of the sample virtual human body image and the corresponding first sample human body image or second sample human body image, and updating the weight of the first discriminator through inverse gradient calculation according to the first probability; Repeating the steps until the human body generates network convergence. In some embodiments, the context fusion network comprises a second generator and a second determiner, the method comprising: Acquiring a sample target environment image, a sample real character working image, sample human body coordinate information and a sample virtual human body image, wherein the sample virtual human body image is generated by the human body generation network; Inputting the sample target environment image, the sample virtual human body image and the sample human body coordinate information into the second generator to generate a sample virtual human figure working image; inputting the sample virtual character working image or the sample real character working image to the second judging device, and outputting a second probability; Updating the second generator through inverse gradient calculation according to the comparison result of the sample virtual character working image and the corresponding sample real character working image, and updating the weight of the second discriminator through inverse gradient calculation according to the second probability; Repeating the steps until the background fusion network converges. In some embodiments, before generating annotation information for the virtual persona work image to obtain the virtual persona work image with the annotati